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(57) The present invention provides DNA molecules 
that constitute fragments of the genome of a plant, and 
polypeptides encoded thereby. The DNA molecules are 
useful for specifying a gene product in cells, either as a 
promoter or as a protein coding sequence or as an UTR 
or as a 3' termination sequence, and are also useful in 
controlling the behavior of a gene in the chromosome. 



in controlling the expression of a gene or as tools for 
genetic mapping, recognizing or isolating identical or re- 
lated DNA fragments, or Identification of a particular in- 
dividual organism, or for clustering of a group of organ- 
isms with a common trait. 

OArabidopsis DNA is used in the present experi- 
ment, but the procedure is a general one. 
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Description 

FIELD OF THE INVENTION 

5 [0001] The present invention relates to isolated polynucleotides that represent a complete gene, or a fragment there- 
of, that is expressed. In addition, the present invention relates to the polypeptide or protein corresponding to the coding 
sequence of these polynucleotides. The present invention also relates to isolated polynucleotides that represent reg- 
ulatory regions of genes. The present Invention also relates to isolated polynucleotides that represent untranslated 
regions of genes. The present invention further relates to the use of these isolated polynucleotides and polypeptides 

10 and proteins. 

DESCRIPTION OF THE RELATED ART 

[0002] Efforts to map and sequence the genome of a number of organisms are in progress; a few complete genome 
IS sequences, for example those of E. co// and Saccharomyces cerevisiae are known (Blattner et al., Science 277 : 1453 
(1997); Goffeau et al., Science 274 :546 (1996)). The complete genome of a multicellular organism. C. elegans, has 
also been sequenced (See, the C. elegans Sequencing Consortium, Science 282 :2012 (1998)). To date, no complete 
genome of a plant has been sequenced, nor has a complete cDNA complement of any plant been sequenced. 

20 SUMMARY OF THE INVENTION 

[0003] The present Invention comprises polynucleotides, such as complete cDNA sequences and/or sequences of 
genomic DNA encompassing complete genes, fragments of genes, and/or regulatory elements of genes and/or regions 
with other functions and/or intergenic regions, hereinafter collectively referred to as Sequence-Determined DNA Frag- 

25 ments (SDFs). from different plant species, particularly corn, wheat, soybean, rice and Arabidopsis thaliana, and other 
plants and or mutants, variants, fragments or fusions of said SDFs and polypeptides or proteins derived therefrom. In 
some instances, the SDFs span the entirety of a protein-coding segment. In some instances, the entirety of an mRNA 
is represented. Other objects of the invention that are also represented by SDFs of the invention are control sequences, 
such as. but not limited to. promoters. Complements of any sequence of the invention are also considered part of the 

30 Invention. 

[0004] Other objects of the Invention are polynucleotides comprising exon sequences, polynucleotides comprising 
intron sequences, polynucleotides comprising introns together with exons. intron/exon junction sequences. 5' untrans- 
lated sequences, and 3' untranslated sequences of the SDFs of the present invention. Polynucleotides representing 
the joinder of any exons described herein, in any arrangement, for example, to produce a sequence encoding any 
35 desirable amino acid sequence are within the scope of the invention. 

[0005] The present invention also resides in probes useful for Isolating and Identifying nucleic acids that hybridize 
to an SDF of the invention. The probes can be of any length, but more typically are 12-2000 nucleotides In length; 
more typically, 15 to 200 nucleotides long; even more typically. 18 to 100 nucleotides long. 

[0006] Yet another object of the invention is a method of isolating and/or identifying nucleic acids using the following 
40 steps: 

(a) contacting a probe of the instant invention with a polynucleotide sample under conditions that permit hybridi- 
zation and formation of a polynucleotide duplex; and 

(b) detecting and/or isolating the duplex of step (a). 

45 

[0007] The conditions for hybridization can be from low to moderate to high stringency conditions. The sample can 
include a polynucleotide having a sequence unique in a plant genome. Probes and methods of the Invention are useful, 
for example, without limitation, for mapping of genetic traits and/or for positional cloning of a desired fragment of ge- 
nomic DNA. 

so [0008] Probes and methods of the invention can also be used for detecting alternatively spliced messages within a 
species. Probes and methods of the invention can further be used to detect or isolate related genes in other plant 
species using genomic DNA (gDNA) and/or cDNA libraries. In some Instances, especially when longer probes and low 
to moderate stringency hybridization conditions are used, the probe will hybridize to a plurality of cDNA and/or gDNA 
sequences of a plant. This approach is useful for isolating representatives of gene families which are identifiable by 

55 possession of a common functional domain in the gene product or which have common cis-acting regulatory sequences. 
This approach is also useful for identifying orthologous genes from other organisms. 

[0009] The present invention also resides in constructs for modulating the expression of the genes comprised of all 
or a fragment of an SDF. The constructs comprise all or a fragment of the expressed SDF, or of a complementary 
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sequence. Examples of constructs include ribozymes comprising RNA encoded by an SDF or by a sequence comple* 
mentary thereto, antisense constructs, constructs comprising coding regions or parts thereof, constructs compnsing 
promoters, introns, untranslated regions, scaffold attachment regions, methylating regions, enhancing or reoucing re- 
gions DNA and chromatin conformation modifying sequences, etc. Such constructs can be constructed using viral. 

5 plasmid bacterial artificial chromosomes (BACs). plasmid artificial chromosomes (PACs). autonomous plant piasmids. 
plant artificial chromosomes or other types of vectors and exist in the plant as autonomous replicating sequences or 
as DNA integrated into the genome. When inserted into a host cell the construct is, preferably, functionally integrated 
with, or operatively linked to, a heterologous polynucleotide. For instance, a coding region from an SDF might be 
operably linked to a promoter that is functional in a plant. 

10 [0010] The present invention also resides in host cells, including bacterial or yeast cells or plant cells, and plants 
that harbor constructs such as described above. Another aspect of the invention relates to methods for modulating 
expression of specific genes in plants by expression of the coding sequence of the constructs, by regulation of expres- 
sion of one or more endogenous genes in a plant or by suppression of expression of the polynucleotides of the invention 
in a plant. Methods of modulation of gene expression include without limitation (1) inserting into a host cell additional 

15 copies of a polynucleotide comprising a coding sequence; (2) modulating an endogenous promoter in a host cell; (3) 
inserting antisense or ribozyme constructs into a host cell and (4) inserting into a host cell a polynucleotide comprising 
a sequence encoding a variant, fragment, or fusion of the native polypeptides of the instant invention. 

BRIEF DESCRIPTION OF THE TABLES 

20 

[0011] The sequences of exemplary SDFs and polypeptides corresponding to the coding sequences of the instant 
invention are described in Reference Tables 1 and 2, REF Tables 1 and 2"; and in Sequence Tables 1 and 2. SEQ 
Tables 1 and 2." The REF Tables refer to a number of Maximum Length Sequences" or MLS." Each MLS corresponds 
to the longest cDNA obtained, either by cloning or by the prediction from genomic sequence. The sequence of the 
25 MLS is the cDNA sequence as described in the Av subsection of the REF Tables. 
[0012] The REF Table includes the following information relating to each MLS: 

I. cDNA Sequence 

30 A. 5* UTR 

B. Coding Sequence 

C. 3' UTR 



35 



II. Genomic Sequence 

A. Exons 

B. Introns 

C. Promoters 

40 III. Link of cDNA Sequences to Clone IDs 

IV. Multiple Transcription Start Sites 

V. Polypeptide Sequences 

A. Signal Peptide 
45 B. Domains 

C. Related Polypeptides 

VI. Related Polynucleotide Sequences 
50 L cDNA SEQUENCE 

[0013] The REF Tables indicate which sequence in the SEQ Tables represents the sequence of each MLS. The MLS 
sequence can comprise 5* and 3' UTR as well as coding sequences. In addition, specific cDNA clone numbers also 
are included in the REF Tables when the MLS sequence relates to a specific cDNA clone. 

A. 5* UTR 

[0014] The location of the 5' UTR can be detemnined by comparing the most 5' MLS sequence with the conresponding 
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genomic sequence as indicated in the REF Tables. The sequence that matches, beginning at any of the transcriptional 
start sites and ending at the last nucleotide before any of the translational start sites corresponds to the 5' UTR. 

B. Coding Region 

5 

[0015] The coding region is the sequence in any open reading frame found in the MLS. Coding regions of interest 
are indicated in the Poly P SEQ subsection of the REF Tables. 

C. 3' UTR 

10 

[001 6] The location of the 3* UTR can be determined by comparing the most 3* MLS sequence with the corresponding 
genomic sequence as indicated in the REF Tables. The sequence that matches, beginning at the translational stop 
site and ending at the last nucleotide of the MLS corresponds to the 3' UTR. 

*5 II. GENOMIC SEQUENCE 



[0017] Further, the REF Tables indicate the specific gi" number of the genomic sequence if the sequence resides in 
a public databank. For each genomic sequence, the REF Tables indicate which regions are included in the MLS. These 
regions can include the 5* and 3* UTRs as well as the coding sequence of the MLS. See. for example, the scheme below: 



25 



30 



Region 1 Region 2 Region 3 

5' UTR I Exon I ~\ Exon I — I Exon I 3' UTR I- 



I I 1^1 

Promoter I Intron Intron I 

Translational Stop Codon 

Start Site 



[001 8] The REF Tables report the first and last base of each region that are included in an MLS sequence. An example 
is shown below: 

gi No. 47000: 
35 37102... 37497 

37593 ... 37925 

The numbers indicate that the MLS contains the following sequences from two regions of gi No. 47000; a first region 
including bases 37102-37497, and a second region including bases 37593-37925. 

40 A. EXON SEQUENCES 

[0019] The location of the exons can be determined by comparing the sequence of the regions from the genomic 
sequences with the corresponding MLS sequence as indicated by the REF Tables. 

45 L INITIAL EXON 

[0020] To determine the location of the initial exon, information from the 

(1) polypeptide sequence section; 
50 (2) cDNA polynucleotide section: and 

(3) the genomic sequence section 

of the REF Tables are used. First, the polypeptide section will indicate where the translational start site is located in 
the MLS sequence. The MLS sequence can be matched to the genomic sequence that corresponds to the MLS. Based 
55 on the match between the MLS and corresponding genomic sequences , the location of the translational start site can 
be determined in one of the regions of the genomic sequence. The location of this translational start site is the start of 
the first exon. 

[0021] Generally, the last base of the exon of the corresponding genomic region, in which the translational start site 
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was located, will represent the end of the initial exon. In some cases, the initial exon will end with a stop codon, when 
the initial exon is the only exon. 

[0022] In the case when sequences representing the MLS are In the positive strand of the con-esponaing genomic 
sequence, the last base will be a larger number than the first base. When the sequences representing the MLS are in 
the negative strand of the corresponding genomic sequence, then the last base will be a smaller number than the first 
base. 

li. INTERNAL EXONS 

[0023] Except for the regions that comprise the 5' and 3' UTRs. initial exon. and terminal exon. the remaining genomic 
regions that match the MLS sequence are the internal exons. Specifically, the bases defining the boundanes of the 
remaining regions also define the intron/exon junctions of the internal exons. 

iii. TERMINAL EXON 

[0024] As with the Initial exon. the location of the terminal exon is detennined with information from the 

(1) polypeptide sequence section; 

(2) cDNA polynucleotide section; and 

(3) the genomic sequence section 

of the REF Tables. The polypeptide section will indicate where the stop codon is located in the MLS sequence. The 
MLS sequence can be matched to the corresponding genomic sequence. Based on the match between MLS and 
con-esponding genomic sequences, the location of the stop codon can be determined In one of the regions of the 
genomic sequence. The location of this stop codon is the end of the terminal exon. Generally, the first base of the exon 
of the con-esponding genomic region that matches the cDNA sequence, In which the stop codon was located, will 
represent the beginning of the terminal exon. In some cases, the translational start site will represent the start of the 
terminal exon, which will be the only exon. 

[0025] In the case when the MLS sequences are in the positive strand of the con-esponding genomic sequence, the 
last base will be a larger number than the first base. When the MLS sequences are in the negative strand of the 
corresponding genomic sequence, then the last base will be a smaller number than the first base. 

B. INTRON SEQUENCES 

[0026] In addition, the introns con-esponding to the MLS are defined by Identifying the genomic sequence located 
between the regions where the genomic sequence comprises exons. Thus, introns are defined as starting one base 
downstream of a genomic region comprising an exon, and end one base upstream from a genomic region compnsing 
an exon. 

C. PROMOTER SEQUENCES 

[0027] As indicated below, promoter sequences con-esponding to the MLS are defined as sequences upstream of 
the first exon; more usually, as sequences upstream of the first of multiple transcription start sites; even more usually 
as sequences about 2.000 nucleotides upstream of the first of multiple transcription start sites. 

III. LINK of cDNA SEQUENCES to CLONE IDs 

[0028] As noted above, the REF tables Identify the cDNA clone(s) that relate to each MLS. The MLS sequence can 
be longer than the sequences included in the cDNA clones. In such a case, the REF table indicates the region of the 
MLS that is included in the clone. If either the 5' or 3* temnini of the cDNA clone sequence is the same as the MLS 
sequence, no mention will be made. 

iV. Multiple Transcription Start Sites 

[0029] Initiation of transcription can occur at a number of sites of the gene. The REF tables indicate the possible 
multiple transcription sites for each gene. In the REF tables, the location of the transcription start sites can be either 
a positive or negative number. The positions indicated by positive numbers refer to the transcription start sites as 
located in the MLS sequence. The negative numbers indicate the transcription start site within the genomic sequence 
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that corresponds to the MLS. 

[0030] To determine the location of the transcription start sites with the negative numbers, the IVILS sequence is 
aligned with the corresponding genomic sequence. In the instances when a public genomic sequence is referenced, 
the relevant corresponding genomic sequence can be found by direct reference to the nucleotide sequence Indicated 
5 by the gl" number shown in the public genomic DNA section of the REF tables. When the position is a negative number, 
the transcription start site is located in the corresponding genomic sequence upstream of the base that matches the 
beginning of the MLS sequence in the alignment. The negative number is relative to the first base of the MLS sequence 
which matches the genomic sequence corresponding to the relevant gi" number. 

[0031] In the instances when no public genomic DNA is referenced, the relevant nucleotide sequence for alignment 
10 is the nucleotide sequence associated with the amino acid sequence designated by gi" number of the later PolyP SEQ 
subsection. 

V. Polypeptide Sequences 

15 [0032] The PolyP SEQ subsection lists SEQ ID NOs and Ceres SEQ ID NO for polypeptide sequences corresponding 
to the coding sequence of the MLS sequence and the location of the translational start site with the coding sequence 
of the MLS sequence. 

[0033] The MLS sequence can have multiple translational start sites and can be capable of producing more than 
one polypeptide sequence. 

20 

A. Signal Peptide 

[0034] The REF Tables also indicate in subsection (B) the cleavage site of the putative signal peptide of the polypep- 
tide corresponding to the coding sequence of the MLS sequence. Typically, signal peptide coding sequences comprise 
25 a sequence encoding the first residue of the polypeptide to the cleavage site residue. 

B. Domains 

[0035] Subsection (C) provides information regarding identified domains (where present) within the polypeptide and 
30 (where present) a name for the polypepfide domain. 

C. Related Polypeptides 

[0036] Subsection (Dp) provides (where present) information conceming amino acid sequences that are found to be 
35 related and have some percentage of sequence identity to the polypeptide sequences of REF and SEQ TABLES 1 
AND 2. These related sequences are identified by a gi" number. 

VI. Related Polynucleotide Sequences 

^ [0037] Subsection (Dn) provides polynucleotide sequences (where present) that are related to and have some per- 
centage of sequence identity to the MLS or conresponding genomic sequence. 



Abbreviation 


Description 


Max Len. Seq. 


Maximum Length Sequence 


rel to 


Related to 


Clone Ids 


Clone ID numbers 


Pub gDNA 


Public Genomic DNA 


gi No. 


gi number 


Gen. seq. in cDNA 


Genomic Sequence in cDNA (Each region for a single gene prediction is 
listed on a separate line. 




In the case of multiple gene predictions, the group of regions relating to a 
single prediction are separated by a blank line) 


(Ac) cDNA SEQ 


cDNA sequence 
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(continued) 



Abbreviation 

Pat. Appln. SEQ ID NO 



Descnption 
Patent Application SEQ ID NO: 



- Ceres SEQ ID NO: 1673877 



-SEQ#w.TSS 



-Clone ID #:#-># 



PolyP SEQ 



. Pat. Appln. SEQ ID NO: 



Ceres SEQ ID NO: 



Location within the cDNAsequence. SEQ ID NO:, of Transcription Start Sites 
which are listed below 



Clone ID comprises bases # to # of the cDNA Sequence 



Polypeptide Sequence 



- Ceres SEQ ID NO 



- Loc. SEQ ID NO: @ nt. 



(C) Pred. PP Nom. & Annot. 



- (Title) 

- Loc. SEQ ID NO #:#-># aa. 



(Dp) Rel. AA SEQ 



- Align. NO 



- gi No 



' Desp. 



. % Idnt. 



- Align. Len. 



- Loc. SEQ ID NO: # -> # aa 



Patent Application SEQ ID NO: 



Ceres SEQ ID NO: 



Location of translational start site in cDNA of SEQ ID NO: at nucleotide 
number 



Nomination and Annotation of Domains within Predicted Polypeptide(s) 



Name of Domain 

Location of the domain within the polypeptide of SEQ ID NO: from # to # 

amino acid residues. 



Related Amino Acid Sequences 



Alignment number 



Gi number 



Description 



Percent identity 



Alignment Length 



Location within SEQ ID NO: from # to # amino acid residue. 



DETAILED DESCRIPTION OF THE INVENTION 

[0038] The invention relates to (I) polynucleotides and methods of use thereof, such as 

IA. Probes, Primers and Substrates; 

IB. Methods of Detection and Isolation; 

B.1. Hybridization; 

B.2. Methods of Mapping; 

8.3. Southern Blotting; 

B.4. Isolating cDNAfrom Related Organisms; 

B. 5. Isolating and/or Identifying Orthologous Genes 

IC. Methods of Inhibiting Gene Expression 

C. I. Antisense 

C.2. Ribozyme Constructs; 

C.3. Chimeraplasts; 

C.4 Co-Suppression; 

C.5. Transcriptional Silencing 

C.6. Other Methods to Inhibit Gene Expression 

ID. Methods of Functional Analysis; 

IE. Promoter Sequences and Their Use; 
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IF. UTRs and/or Intron Sequences and Their Use; and 

IG. Coding Sequences and Their Use. 

[0039] The invention also relates to (It) polypeptides and proteins and methods of use thereof, such as 

5 

IIA. Native Polypeptides and Proteins 

A.1 Antibodies 

A. 2 In Vitro Applications 

10 

MB. Polypeptide Variants. Fragments and Fusions 

B. 1 Variants 
8.2 Fragments 

15 B.3 Fusions 

[0040] The invention also includes (III) methods of modulating polypeptide production, such as 

IIIA. Suppression 

20 

A.1 Antisense 
A.2 Ribozymes 
A.3 Co-suppression 

A.4 Insertion of Sequences into the Gene to be Modulated 
25 A.5 Promoter Modulation 

A. 6 Expression of Genes containing Dominant-Negative Mutations 

IIIB. Enhanced Expression 

30 B.1 Insertion of an Exogenous Gene 

B. 2 Promoter Modulation 

[0041] The invention further concems (IV) gene constructs and vector construction, such as 

35 IVA. Coding Sequences 

IVB. Promoters 
IVC. Signal Peptides 

[0042] The invention still further relates to 
^0 V Transformation Techniques 

Definitions 

[0043] Allelic variant An allelic varianr is an alternative form of the same SDF, which resides at the same chro- 
^5 mosomal locus in the organism. Allelic variations can occur in any portion of the gene sequence, including regulatory 
regions. Allelic variants can arise by normal genetic variation in a population. Allelic variants can also be produced by 
genetic engineering methods. An allelic variant can be one that is found in a naturally occurring plant, including a 
cultivar or ecotype. An allelic variant may or may not give rise to a phenotypic change, and may or may not be expressed. 
An allele can result in a detectable change in the phenotype of the trait represented by the locus. A phenotypically 
50 silent allele can give rise to a product. 

[0044] Alternatively spliced messages Within the context of the current invention, alternatively spliced messag- 
es" refers to mature mRNAs originating from a single gene with variations in the number and/or identity of exons. 
introns and/or intron-exon junctions. 

[0045] Chimeric The term chimeric" is used to describe genes, as defined supra, or contructs wherein at least 
55 two of the elements of the gene or construct, such as the promoter and the coding sequence and/or other regulatory 
sequences and/or filler sequences and/or complements thereof, are heterologous to each other. 
[0046] Constitutive Promoter: Promoters referred to herein as "constitutive promoters" actively promote transcription 
undermost, but not necessarily all, environmental conditions and states of development or cell differentiation. Examples 
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pla'nt genes, such as the maize ubiquitin-1 promoter. ^;^J° l^^^^^T^^ ^^^^ i„ the current invention, refers to 

Lrtha^rs^err^^^^^^^^ 

SSirOomain: Domains are fingerprints or signatures that can he used t^^^^^^^^^^ 

U Of proteins. Such fingerprints or s*gr^^^^ ^e3.yTach SraS reHsrored with either a family of 
ture. and/or (3) three-d.mens.onal f with specific in-vitro and/or in-vivo ac- 

proteins or motifs. Typically, these ^'"'^'f^'''^°''^J^l^^^l^ of a protein. Detailed descriptions of the 
tivities, A domain can be any length, "^"^ ^^^^'^^^^^^^^ of the instant invention are de- 

any polypeptide that comprises the ^'^1^'"^'^^%.^.^ ^^^text of the current invention refers to any polynucle- 
[0049] Endogenous . ^« 

otide. polypeptide or prote.n sequence which « a natura^^^^^^ polynucleotide, polypepfide or protein sequence. 
[0050] Exogenous Exogenous." as referred to ^^^ ^ of an individual host cell or the 

Whether chimeric or not. that is initially '"^^^ thS by a sSxual S^ss. Examples of means by which 

organism -generated from S3,a ho^^ .ansfoLtion (of dicots - e.g. 

this can be accomplished are descnbed below. 3"° J p^.go , 2-987 (1983); of monocots. representative 
Salomon et al. EMBO J. 3:141 (1984); ""^^'^'^f^j'^l^^^i ^ '^1^ B/oic/,no/ogy 14^745 (1996). May 

rSmpas^sinselganaturally^und^^^^^^^ 

KL::r=apartiS^^^^^^ 

Siuence contiguous^ associated with a -"f ^^^^^^^^^^^^^^ to. those that 
can include nonK«ding sequences that modulate '^^f "^o^^f^^"*^^^^^^^^ confomiation. extent and position of 
specify polyadenylation. transcnptional regula^^nDN^^^^^ 

base methylation and binding sites of proteins that control all of "^^^^ ^^^^^^^ ^ . function may require 

Which may be interrupted by introns" (non-coding ««'"«"^>; f,^^^^^^^ acids without asso- 

only RNA expression or protein P^°«l"«='""- '"^^^^^^^^^ in such a way that one gene 

loo^l IS used in the current invenfion to describe a group of functional^ related genes. 

each of which encodes a separate P"'*^'"- sequences" are those that are not operatively linked or are not 

10054] Heterologous sequences. ^Tm com is considered heterologous to an Arabidopsis 

contiguous to each other in nature. For example, a P'°'^^^'^^ ^'l^ ^^^^ is considered heterologous 
coding region sequence. Also, a promoter from a sene encoding t^^^^'^l^l'^^ ^quences. such as UTRs or 3' 
to a sequence encoding the com receptor for the 9™**^^^°^ jl^^^^^^^^^ 
end temnlnation sequences that do not originate in nature the sanj9^ne as me c g q 

are considered heterologous to f ^ -^'"^ ^^^^^^^^^ 
other are not heterologous to each other, on the crtherhand^theae^m^ 



5 



12 



EP 1 033 405 A2 

cleotides of the present invention, is PARSK1, the promoter from the Arabidopsis gene encoding a serine-threonine 
kinase enzyme, and which promoter is induced by dehydration, abscissic acid and sodium chloride (Wang and Good- 
man, Piant J. 8:37 (1995)) Examples of environmental conditions that may affect transcription by inducible promoters 
include anaerobic conditions, elevated temperature, or the presence of light. 
5 [0057] intergenic region Intergenic region." as used in the current invention, refers to nucleotide sequence oc- 
curring in the genome that separates adjacent genes. 

[0058] Mutant gene In the current invention, mutant" refers to a heritable change in DNA sequence at a specific 
location. Mutants of the current invention may or may not have an associated identifiable function when the mutant 
gene is transcribed. 

10 [0059] Orthologous Gene In the current invention orthologous gene" refers to a second gene that encodes a 
gene product that performs a similar function as the product of a first gene. The orthologous gene may also have a 
degree of sequence similarity to the first gene. The orthologous gene may encode a polypeptide that exhibits a degree 
of sequence similarity to a polypeptide corresponding to a first gene. The sequence similarity can be found within a 
functional domain or along the entire length of the coding sequence of the genes and/or their corresponding polypep- 

15 tides. 

[0060] Percentage of sequence identity "Percentage of sequence identity," as used herein, is determined by 
comparing two optimally aligned sequences over a comparison window, where the fragment of the polynucleotide or 
amino acid sequence in the comparison window may comprise additions or deletions (e.g., gaps or overhangs) as 
compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two 

20 sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid 
base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number 
of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 
to yield the percentage of sequence identity. Optimal alignment of sequences for comparison may be conducted by 
the local homology algorithm of Smith and Waterman Add. APL. Math. 2:482 (1981), by the homology alignment at- 

25 gorithm of Needleman and Wunsch J, Mol, Biol. 48:443 (1970), by the search for similarity method of Pearson and 
Lipman Proc. Natl. Acad Sci. (USA) 85: 2444 (1988), by computerized implementations of these algorithms (GAP, 
BESTFIT, BLAST, PASTA, and TFASTA in the Wisconsin Genetics Software Package. Genetics Computer Group 
(GCG), 575 Science Dr.. Madison. Wl), or by inspection. Given that two sequences have been identified for comparison. 
GAP and BESTFIT are preferably employed to determine their optimal alignment. Typically, the default values of 5.00 

30 for gap weight and 0.30 for gap weight length are used. The term "substantial sequence identity" between polynucleotide 
or polypeptide sequences refers to polynucleotide or polypeptide comprising a sequence that has at least 80% se- 
quence identity, preferably at least 85%, more preferably at least 90% and most preferably at least 95%. even more 
preferably, at least 96%, 97%, 98% or 99% sequence identity compared to a reference sequence using the programs. 
[0061] Plant Promoter A plant, promoter" is a promoter capable of initiating transcription in plant cells and can 

35 drive or facilitate transcription of a fragment of the SDF of the instant invention or a coding sequence of the SDF of the 
instant invention. Such promoters need not be of plant origin. For example, promoters derived from plant viruses, such 
as the CaMV35S promoter or from Agrobacterium tumefaciens such as the T-DNA promoters, can be plant promoters. 
A typical example of a plant promoter of plant origin is the maize ubiquitin-1 (ubi-l)promoter known to those of skill. 
[0062] Promoter: The term "promoter," as used herein, refers to a region of sequence determinants located 

^0 upstream from the start of transcription of a gene and which are involved in recognition and binding of RNA polymerase 
and other proteins to initiate and modulate transcription. A basal promoter is the minimal sequence necessary for 
assembly of a transcription complex required for transcription initiation. Basal promoters frequently include a TATA 
box" element usually located between 15 and 35 nucleotides upstream from the site of initiation of transcription. Basal 
promoters also sometimes include a CCAAT box" element (typically a sequence CCAAT) and/or a GGGCG sequence, 

45 usually located between 40 and 200 nucleotides, preferably 60 to 120 nucleotides, upstream from the start site of 
transcription. 

[0063] Public sequence: The term public sequence as used in the context of the instant application, refers to 
any sequence that has been deposited in a publicly accessible database. This term encompasses both amino acid 
and nucleotide sequences. Such sequences are publicly accessible, for example, on the BLAST databases on the 
50 NCBI FTP web site (accessible at ncbi.nlm.gov/blast). The database at the NCBI GTP site utilizes gi" numbers assigned 
by NCBI as a unique identifier for each sequence in the databases, thereby providing a non-redundant database for 
sequence from various databases, including GenBank, EMBL, DBBJ. (DNA Database of Japan) and PDB (Brookhaven 
Protein Data Bank). 

[0064] Regulatory Sequence The term regulatory sequence," as used in the current invention, refers to any 
55 nucleotide sequence that influences transcription or translation initiation and rate, and stability and/or mobility of the 
transcript or polypeptide product. Regulatory sequences include, but are not limited to. promoters, promoter control 
elements, protein binding sequences. 5* and 3' UTRs, transcriptional start site, termination sequence, polyadenylation 
sequence, introns. certain sequences within a coding sequence, etc. 
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[00651 Related Sequences: Related sequences" refer to either a polypeptide or a nucleotide sequence that 
ovhihits some deoree of seauence similarity with a sequence described by the REF and SEQ tables. 
r0O661 Scaffold Attachment Region (SAR) As used herein, scaffold attachment region" Is a DNA sequence mat 
anchors chromatin to the nuclear matrix or scaffold to generate loop domains that can have either a transcriptionally 
active or inactive structure (Spiker and Thompson (1 996) Plant Physiol. 11 0: 1 5-21 ). 

[0067] Sequence-determined DNA fragments (SDFs) Sequence-determined DNA fragments as used in the 
current invention are isolated sequences of genes, fragments of genes, intergenic regions or contiguous DNA from 
plant genomic DNA or cDNA or RNA the sequence of which has been determined. 

[0068] Signal Peptide A signal peptide" as used in the cun-ent invention is an amino acid sequence that targets 
the protein for secretion, for transport to an intracellular compartment or organelle or for incorporation into a membrane. 
Signal peptides are indicated in the tables and a more detailed description located below. 

[0069] Specific Promoter In the context of the current invention, specific promoters" refers to a subset of induc- 
ible promoters that have a high preference for being induced in a specific tissue or cell and/or at a specific time dunng 
development of an organism. By high preference" is meant at least 3-fold, preferably 5-fold, more preferably at least 
10-fold still more preferably at least 20-fold. 50-fold or 100-fold increase in transcription in the desired tissue over the 
transcription in any other tissue. Typical examples of temporal and/or tissue specific promoters of plant ongin that can 
be used with the polynucleotides of the present invention, are: PTA29. a promoter which is capable of dnving gene 
transcription specifically in tapetum and only during anther development (Koltonow et al.. Plant Ce// 2:1201 1990 ; 
RCc2 and RCc3. promoters that direct root-specific gene transcription in rice (Xu et al.. Plant Mo/. Biol. 27:237 (1995), 
TobRB27 aroot-specificpromoterfromtobacco(Yamamotoetal.,P/anfCe;/3:371 (1991)).Examplesoftissuespecific 
promoters under developmental control include promoters that initiate transcription only in certain tissues or organs, 
such as root, ovule, fruit, seeds, or flowers. Other suitable promoters include those from genes encoding storage 
proteins or the lipid body membrane protein, oleosin. A few root-specific promoters are noted above. 
[0070] Stringency "Stringency" as used herein is a function of probe length, probe composition (G + C content), 
and salt concentration, organic solvent concentration, and temperature of hybridization or wash conditions. Stnngency 
is typically compared by the parameter T„. which is the temperature at which 50% of the complementary molecules 
in the hybridization are hybridized, in terms of a temperature differential from T„. High stringency conditions are those 
providing a condition of T„ - 5»C to T„ - 10°C. Medium or moderate stringency conditions are those providing - 
20°C to T„ - 29°C. Low stringency conditions are those providing a condition of T^ - 40°C to T„ - 48''C. The relationship 
of hybridi^tion conditions to T^ (in °C) is expressed in the mathematical equation 

T„ = 81.5 -16.6(log,o[Na*]) + 0.41(%G+C) - (600/N) (1) 

where N is the length of the probe. This equation works well for probes 14 to 70 nucleotides in length that are identical 
to the target sequence. The equation below for T„ of DNA-DNA hybrids is useful for probes in the range of 50 to greater 
than 500 nucleotides, and for condiUons that include an organic solvent (formamide). 



T„ = 81.5+16.6 log {[Na*l/(1+0.7(Na*])}+ 0.41(%G+C)-500/L 0.63(%formamide) 



(2) 



where L is the length of the probe in the hybrid. (P Tijessen, Hybridization with Nucleic Acid Probes in Laboratory 
Te chniques in Biochemistry and Molecular Biology. PC. vand der Vliet, ed., c. 1993 by Elsevier, Amsterdam. The T^ 
of equation (2) Is affected by the nature of the hybrid; for DNA-RNA hybrids T„ is 10-1 5'C higher than calculated for 
RNA-RNA hybrids T^ is 20-2500 higher. Because the T„ decreases about 1 "C for each 1% decrease in homology 
when a long probe is used (Bonner et al.,J. Mo/. Biol. §1:123 (1973)), stringency conditions can be adjusted to favor 
detection of identical genes or related family members. 

[0071] Equation (2) is derived assuming equilibrium and therefore, hybridizations according to the present invention 
are most preferably performed under conditions of probe excess and for sufficient time to achieve equilibnum. The 
time required to reach equilibrium can be shortened by inclusion of a hybridization accelerator such as dextran sulfate 
or another high volume polymer in the hybridization buffer. 

[00721 Stringency can be controlled during the hybridization reacUon or after hybridization has occurred by altering 
the salt and temperature conditions of the wash solutions used. The formulas shown above are equally valid when 
used to compute the stringency of a wash solution. Prefen-ed wash solution stringencies lie within the ranges stated 
above; high stringency is 5-8°C below T„. medium or moderate stringency is 26-29°C below T„ and low stnngency is 

[Oml\ ^ Sbltantially free of A composition containing A is substantially free of B when at least 85% by weight 
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of the total A+B in the composition Is A. Preferably, A comprises at least about 90% by weight of the total of A+B in 
the composition, more preferably at least about 95% or even 99% by weight. For example, a plant gene or a DNA 
sequence can be considered substantially free of other plant genes or DNA sequences. 

[0074] Translational start site In the context of the current invention, a translational start site" is usually an ATG 
5 in the cDNA transcript, more usually the first ATG. A single cDNA. however, may have multiple translational start sites. 
[0075] Transcription start site Transcription start site" is used in the current invention to describe the point at 
which transcription is initiated. This point is typically located about 25 nucleotides downstream from a TFIID binding 
site, such as a TATA box. Transcription can initiate at one or more sites within the gene, and a single gene may have 
multiple transcriptional start sites, some of which may be specific for transcription in a particular cell-type or tissue. 
10 [0076] Untranslated region (UTR) A UTR" is any contiguous series of nucleotide bases that is transcribed, but 
is not translated. These untranslated regions may be associated with particular functions such as increasing mRNA 
message stability. Examples of UTRs include, but are not limited to polyadenylation signals, terminations sequences, 
sequences located between the transcriptional start site and the first exon (5* UTR) and sequences located between 
the last exon and the end of the mRNA (3' UTR). 
15 [0077] Variant: The term variant" is used herein to denote a polypeptide or protein or polynucleotide molecule 
that differs from others of its kind in some way. For example, polypeptide and protein variants can consist of changes 
in amino acid sequence and/or charge and/or post-translational modifications (such as glycosylation. etc). 

DETAILED DESCRIPTION OF THE INVENTION 

20 

I. Polynucleotides 

[0078] Exemplified SDFs of the invention represent fragments of the genome of corn, wheat, rice, soybean or ^ra- 
bidopsis and/or represent mRNA expressed from that genome. The isolated nucleic acid of the invention also encom- 
25 passes corresponding fragments of the genome and/or cDNA complement of other organisms as described in detail 
below. 

[0079] Polynucleotides of the invention can be isolated from polynucleotide libraries using primers comprising se- 
quence similar to those described by the REF and SEQ Tables. See, for example, the methods described in Sambrook 
et al., supra. 

30 [0080] Alternatively, the polynucleotides of the invention can be produced by chemical synthesis. Such synthesis 
methods are described below. 

[0081] It is contemplated that the nucleotide sequences presented herein may contain some small percentage of 
errors. These errors may arise in the normal course of determination of nucleotide sequences. Sequence errors can 
be corrected by obtaining seeds deposited under the accession numbers cited above, propagating them, isolating 
35 genomic DNA or appropriate mRNA from the resulting plants or seeds thereof, amplifying the relevant fragment of the 
genomic DNA or mRNA using primers having a sequence that flanks the erroneous sequence, and sequencing the 
amplification product. 

LA. Probes, Primers and Substrates 

40 

[0082] SDFs of the invention can be applied to substrates for use in array applications such as, but not limited to, 
assays of global gene expression, for example under varying conditions of development, growth conditions. The arrays 
can also be used in diagnostic or forensic methods (WO95/35505. US 5,445,943 and US 5,410.270). 
[0083] Probes and primers of the instant invention will hybridize to a polynucleotide comprising a sequence in REF 

45 and SEQ TABLES 1 AND 2. Though many different nucleotide sequences can encode an amino acid sequence, the 
sequences of REF and SEQ TABLES 1 AND 2 are generally preferred for encoding polypeptides of the invention. 
However, the sequence of the probes and/or primers of the instant invention need not be identical to those in REF and 
SEQ TABLES 1 AND 2 or the complements thereof. For example, some variation in probe or primer sequence and/or 
length can allow additional family members to be detected, as well as orthologous genes and more taxonomically 

50 distant related sequences. Similarly, probes and/or primers of the invention can include additional nucleotides that 
serve as a label for detecting the formed duplex or for subsequent cloning purposes. 

[0084] Probe length will vary depending on the application. For use as primers, probes are 12-40 nucleotides, pref- 
erably 18-30 nucleotides long. For use In mapping, probes are preferably 50 to 500 nucleotides, preferably 100-250 
nucleotides long. For Southern hybridizations, probes as long as several kilobases can be used as explained below. 
55 [0085] The probes and/or primers can be produced by synthetic procedures such as the triester method of Matteucci 
et al. J. Am. Chem, Soc, 1 03:31 85( 1981); or according to Urdea et al. Proc. NatL Acad. 80:7461 (1981) or using 
commercially available automated oligonucleotide synthesizers. 
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I.B. Methods of Detection and Isolation 



[0086] The polynucleotides of the invention can be utilized in a number of methods known to those skiiiea in tne an 
as probes and/or primers to isolate and detect polynucleotides, including, without limitation: Southerns. Northerns. 
Branched DNA hybridization assays, polymerase chain reaction, and microan^y assays, and variations thereof. Spe- 
cific methods given by way of examples, and discussed below include: 

Hybridization 
Methods of Mapping 
Southem Blotting 

Isolating cDNA from Related Organisms 
Isolating and/or Identifying Orthologous Genes. 

Also the nucleic acid molecules of the invention can used in other methods, such as high density oligonucleotide 
hybridizing assays, described, for example, in U.S. Pat. Nos. 6.004,753; 5,945,306; 5.945.287; 5,945.308; 5.919,686; 
5 919 661- 5 919 627; 5,874.248; 5.871.973; 5.871.971; and 5.871.930; and PCX Pub. Nos. WO 9946380; WO 
9933981; WO 9933870; WO 9931252; WO 9915658; WO 9906572; WO 9858052; WO 9958672; and WO 9810858. 

B.I. Hybridization 

[0087] The isolated SDFs of REF and SEQ TABLES 1 AND 2 of the present invention can be used as probes and/ 
or primers for detection and/or isolation of related polynucleotide sequences through hybridization. Hybndization of 
one nucleic acid to another constitutes a physical property that defines the subject SDF of the invention and the identified 
related sequences. Also, such hybridization imposes structural limitations on the pair. A good general discussion of 
the factors for detennining hybridization conditions is provided by Sambrook et al. ("Molecular Cloning, a Laboratory 
Manual. 2nd ed.. c. 1989 by Cold Spring Harbor Laboratory Press Cold Spring Harbor, NY; see esp., chapters 11 and 
12). Additional considerations and details of the physical chemistry of hybridization are provided by G.H. Keller and 
M.M. Manak DNA Probes". 2"^ Ed. pp. 1-25, c. 1993 by Stockton Press. New York. NY. 

[0088] Depending on the stringency of the conditions under which these probes and/or primers are used, polynucle- 
otides exhibiting a wide range of similarity to those In REF and SEQ TABLES 1 AND 2 can be detected or isolated. 
When the practitioner wishes to examine the result of membrane hybridizations under a vanety of stnngencies. an 
efficient way to do so is to perfomi the hybridization under a low stringency condition, then to wash the hybridization 
membrane under increasingly stringent conditions. 

[0089] When using SDFs to identify orthologous genes in other species, the practitioner will preferably adjust the 
amount of target DNA of each species so that, as nearly as is practical, the same number of genome equivalents are 
present for each species examined. This prevents faint signals from species having large genomes, and thus small 
numbers of genome equivalents per mass of DNA, from erroneously being interpreted as absence of the con-esponding 
gene in the genome. 

[0090] The probes and/or primers of the instant invention can also be used to detect or isolate nucleotides that are 
identical" to the probes or primers. Two nucleic acid sequences or polypeptides are said to be "identical" if the sequence 
of nucleotides or amino acid residues, respectively, in the two sequences is the same when aligned for maximum 
correspondence as described below. 

[0091] Isolated polynucleotides within the scope of the invention also include allelic variants of the specific sequences 
presented in REF and SEQ TABLES 1 AND 2. The probes and/or primers of the invention can also be used to detect 
and/or isolate polynucleotides exhibiting at least 80% sequence identity with the sequences of REF and SEQ TABLES 
1 AND 2 or fragments thereof. 

[0092] With respect to nucleotide sequences, degeneracy of the genetic code provides the possibility to substitute 
at least one base of the base sequence of a gene with a different base without causing the amino acid sequence of 
the polypeptide produced from the gene to be changed. Hence, the DNA of the present invention may also have any 
base sequence that has been changed from a sequence in REF and SEQ TABLES 1 AND 2 by substitution in accord- 
ance with degeneracy of genetic code. References describing codon usage include: Carels et aL, J. Mol. Evol. 46: 45 
(1998) and Fennoy et al., NucL Acids Res. 21(23) : 5294 (1993). 

B.2. Mapping 

[0093] The isolated SDF DNA of the invention can be used to create various types of genetic and physical maps of 
the genome of com, Arabidopsis. soybean, rice, wheat, or other plants. Some SDFs may be absolutely assoaated 
with particular phenotypic traits, allowing construction of gross genetic maps. While not all SDFs will immediately be 
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associated with a phenotype, all SDFs can be used as probes for identifying polymorphisms associated with phenotypes 
of interest. Briefly, one method of mapping involves total DNA isolation from individuals. It is subsequently cleaved with 
one or more restriction enzymes, separated according to mass, transferred to a solid support, hybridized with SDF 
DNA and the pattem of fragments compared. Polymorphisms associated with a particular SDF are visualized as dif- 

5 ferences in the size of fragments produced between individual DNA samples after digestion with a particular restriction 
enzyme and hybridization with the SDR After identification of polymorphic SDF sequences, linkage studies can be 
conducted. By using the individuals showing polymorphisms as parents in crossing programs, F2 progeny recombinants 
or recombinant inbreds, for example, are then analyzed. The order of DNA polymorphisms along the chromosomes 
can be determined based on the frequency with which they are inherited together versus independently. The closer 

to two polymorphisms are together in a chromosome the higher the probability that they are inherited together. Integration 
of the relative positions of all the polymorphisms and associated marker SDFs can produce a genetic map of the 
species, where the distances between markers reflect the recombination frequencies in that chromosome segment. 
[0094] The use of recombinant inbred lines for such genetic mapping is described foT Arabidopsis by Alonso-Blanco 
et al. {Methods in Molecular Biology, vol.82. Arabidopsis Protocols**, pp. 1 37-146. J.M. Marti nez-Zapater and J. Salinas. 

15 eds., c. 1998 by Humana Press, Totowa, NJ) and for corn by Burr ( Mapping Genes with Recombinant Inbreds", pp. 
249-254. in Freeling, M. and V. Walbot (Ed.), The Maize Handbook, c. 1994 by Springer- Veriag New York. Inc.: New 
York, NY, USA; Beriin Germany; Burr et al. Genetics (1998) 118: 519; Gardiner. J. et al., (1993) Genetics 134: 917). 
This procedure, however, is not limited to plants and can be used for other organisms (such as yeast) or for individual 
cells. 

20 [0095] The SDFs of the present invention can also be used for simple sequence repeat (SSR) mapping. Rice SSR 
mapping is described by Morgante et al. (77?e Plant Journal 3: 165), Panaud et al. (Genome (1995) 38: 1170); 

Senior et al. {Crop Science (1996) 36: 1676), Taramino et al. {Genome (1996) 39: 277) and Ahn et al. {Molecular and 
General Genetics (1993) 241: 483-90). SSR mapping can be achieved using various methods. In one instance, poly- 
morphisms are identified when sequence specific probes contained within an SDF flanking an SSR are made and used 

25 in polymerase chain reaction (PGR) assays with template DNA from two or more individuals of interest. Here, a change 
in the number of tandem repeats between the SSR-flanking sequences produces differently sized fragments (U.S. 
Patent 5,766,847). Altematively, polymorphisms can be identified by using the PGR fragment produced from the SSR- 
flanking sequence specific primer reaction as a probe against Southem blots representing different individuals (U.H. 
Refseth et al., (1997) Electrophoresis 18: 1519). 

30 [0096] Genetic and physical maps of crop species have many uses. For example, these maps can be used to devise 
positional cloning strategies for isolating novel genes from the mapped crop species. In addition, because the genomes 
of closely related species are largely syntenic (that is, they display the same ordering of genes within the genome), 
these maps can be used to isolate novel alleles from relatives of crop species by positional cloning strategies. 
[0097] The various types of maps discussed above can be used with the SDFs of the invention to identify Quantitative 

35 Trait Loci (QTLs). Many important crop traits, such as the solids content of tomatoes, are quantitative traits and result 
from the combined interactions of several genes. These genes reside at different loci in the genome, oftentimes on 
different chromosomes, and generally exhibit multiple alleles at each locus. The SDFs of the invention can be used to 
identify QTLs and isolate specific alleles as described by de Vicente and Tanksley (Genef/cs 134 :585 (1 993)). In addition 
to isolating QTL alleles in present crop species, the SDFs of the invention can also be used to isolate alleles from the 

40 corresponding QTL of wild relatives. Transgenic plants having various combinations of QTL alleles can then be created 
and the effects of the combinations measured. Once a desired allele combination has been identified, crop improvement 
can be accomplished either through biotechnological means or by directed conventional breeding programs (for review 
see Tanksley and McCouch, Science 277:1063 (1997)). 

[0098] In another embodiment, the SDFs can be used to help create physical maps of the genome of com, Arabi- 
cs dopsis and related species. Where SDFs have been ordered on a genetic map, as described above, they can be used 
as probes to discover which clones in large libraries of plant DNA fragments in YAGs, BACs, etc. contain the same 
SDF or similar sequences, thereby facilitating the assignment of the large DNA fragments to chromosomal positions. 
Subsequently, the large BAGs, YAGs, etc. can be ordered unambiguously by more detailed studies of their sequence 
composition (e.g. Marra et al. (1997) Genomic Research 7:1072-1084) and by using their end or other sequences to 
50 find the identical sequences in other cloned DNA fragments. The overiapping of DNA sequences in this way allows 
large contigs of plant sequences to be built that, when sufficiently extended, provide a complete physical map of a 
chromosome. Sometimes the SDFs themselves will provide the means of joining cloned sequences into a contig. 
[0099] The patent publication WO95/35505 and U.S. Patents 5,445,943 and 5,410,270 describe scanning multiple 
alleles of a plurality of loci using hybridization to arrays of oligonucleotides. These techniques are useful for each of 
55 the types of mapping discussed above. 

[0100] Following the procedures described above and using a plurality of the SDFs of the present invention, any 
individual can be genotyped. These individual genotypes can be used for the identification of particular cultivars. va- 
rieties, lines, ecotypes and genetically modified plants or can serve as tools for subsequent genetic studies involving 
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multiple phenotypic traits. 

B.3 Southern Blot Hybridization 

[01 01] The sequences from REF and SEQ TABLES 1 AND 2 can be used as probes for various hybridization tech- 
niques. These techniques are useful for detecting target polynucleotides in a sample or for determining whether trans- 
genic plants, seeds or host cells harbor a gene or sequence of interest and thus might be expected to exhibit a particular 
trait or phenotype. 

[0102] In addition, the SDFs from the invention can be used to isolate additional members of gene families from the 
same or different species and/or orthologous genes from the same or different species. This is accomplished by hy- 
bridizing an SDF to. for example, a Southern blot containing the appropriate genomic DNA or cDNA. Given the resulting 
hybridization data, one of ordinary skill in the art could distinguish and isolate the correct DNA fragments by size, 
restriction sites, sequence and stated hybridization conditions from a gel or from a library. 

[0103] Identification and isolation of orthologous genes fre)m closely related species and alleles within a species is 
particularly desirable because of their potential for crop improvement. Many important crop traits, such as the solid 
content of tomatoes, result from the combined interactions of the products of several genes residing at different loci in 
the genome. Generally, alleles at each of these loci can make quantitative differences to the trait. By identifying and 
isolating numerous alleles for each locus from within or different species, transgenic plants with various combinations 
of alleles can be created and the effects of the combinations measured. Once a more favorable allele combination has 
been identified, crop improvement can be accomplished either through biotechnological means or by directed conven- 
tional breeding programs (Tanksley et al. Science 277:1063(1997)). 

[01 04] The results from hybridizations of the SDFs of the invention to, for example, Southern blots containing DNA 
from another species can also be used to generate restriction fragment maps for the corresponding genomic regions. 
These maps provide additional infonnation about the relative positions of restriction sites within fragments, further 
distinguishing mapped DNA from the remainder of the genome. 

[0105] Physical maps can be made by digesting genomic DNA with different combinations of restriction enzymes. 
[0106] Probes for Southern blotting to distinguish individual restriction fragments can range in size from 15 to 20 
nucleotides to several thousand nucleotides. More preferably, the probe is 100 to 1 ,000 nucleotides long for identifying 
members of a gene family when it is found that repetitive sequences would complicate the hybridization. For identifying 
an entire con-esponding gene in another species, the probe is more preferably the length of the gene, typically 2.000 
to 10,000 nucleotides, but probes 50-1,000 nucleotides long might be used. Some genes, however, might require 
probes up to 1.500 nucleotides long or overlapping probes constituting the full-length sequence to span their lengths. 
[0107] Also, while it is prefen-ed that the probe be homogeneous with respect to its sequence, it is not necessary. 
For example, as described below, a probe representing members of a gene family having diverse sequences can be 
generated using PGR to amplify genomic DNA or RNA templates using primers derived from SDFs that include se- 
quences that define the gene family. 

[0108] For identifying con-esponding genes in another species, the next most preferable probe is a cDNA spanning 
the entire coding sequence, which allows all of the mRNA-coding fragment of the gene to be identified. Probes for 
Southem blotting can easily be generated from SDFs by making primers having the sequence at the ends of the SDF 
and using corn or Arabidopsis genomic DNA as a template. In instances where the SDF includes sequence conserved 
among species, primers including the conserved sequence can be used for PGR with genomic DNA from a species of 
interest to obtain a probe. Similariy, if the SDF includes a domain of interest, that fragment of the SDF can be used to 
make primers and. with appropriate template DNA. used to make a probe to identify genes containing the domain. 
Alternatively, the PGR products can be resolved, for example by gel electrophoresis, and cloned and/or sequenced. 
Using Southem hybridization, the variants of the domain among members of a gene family, both within and across 
species, can be examined. 

B.4.1 Isolating DNA from Related Organisms 

[0109] The SDFs of the invention can be used to isolate the con-esponding DNA from other organisms. Either cDNA 
or genomic DNA can be isolated. For isolating genomic DNA. a lambda, cosmid. BAG or YAG, or other large insert 
genomic library from the plant of interest can be constructed using standard molecular biology techniques as described 
in detail by Sambrook et al. 1989 (Molecular Cloning: A Laboratory Manual. 2"^ ed. Gold Spring Hart)or Laboratory 
Press, New York) and by Ausubel et al. 1992 (Gun-ent Protocols in Molecular Biology. Greene Publishing. New York). 
[Olioi To screen a phage library, for example, recombinant lambda clones are plated out on appropriate bacterial 
medium using an appropriate E. coli host strain. The resulting plaques are lifted from the plates using nylon or nitro- 
cellulose filters. The plaque lifts are processed through denaturation, neutralization, and washing treatments following 
the standard protocols outlined by Ausubel et al. (1992). The plaque lifts are hybridized to either radioactively labeled 
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or non-radioactively labeled SDF DNA at room temperature for about 16 hours, usually in the presence of 50% forma- 
nnide and 5X SSC (sodium chloride and sodium citrate) buffer and blocking reagents. The plaque lifts are then washed 
at 42^*0 with 1 % Sodium Dodecyl Sulfate (SDS) and at a particular concentration of SSC. The SSC concentration used 
is dependent upon the stringency at which hybridization occurred In the initial Southern blot analysis performed. For 

5 example, if a fragment hybridized under medium stringency (e.g.. Tm - 20^C). then this condition is maintained or 
preferably adjusted to a less stringent condition (e.g., Tm-30**C) to wash the plaque lifts. Positive clones show detect- 
able hybridization e.g., by exposure to X-ray films or chromogen formation. The positive clones are then subsequently 
isolated for purification using the same general protocol outlined above. Once the clone is purified, restriction analysis 
can be conducted to narrow the region corresponding to the gene of interest. The restriction analysis and succeeding 

10 subcloning steps can be done using procedures described by. for example Sambrook et al. (1989) cited above. 

[0111] The procedures outlined for the lambda library are essentially similar to those used for YAC library screening, 
except that the YAC clones are harbored in bacterial colonies. The YAC clones are plated out at reasonable density 
on nitrocellulose or nylon filters supported by appropriate bacterial medium in petri plates. Following the growth of the 
bacterial clones, the filters are processed through the denaturation, neutralization, and washing steps following the 

15 procedures of Ausubel et al. 1992. The same hybridization procedures for lambda library screening are followed. 

[0112] To isolate cDNA, similar procedures using appropriately modified vectors are employed. For instance, the 
library can be constructed in a lambda vector appropriate for cloning cDNA such as Xgt11. Alternatively, the cDNA 
library can be made in a plasmid vector. cDNA for cloning can be prepared by any of the methods known in the art. 
but is preferably prepared as described above. Preferably, a cDNA library will include a high proportion of full-length 

20 clones. 

B. 5. Isolating and/or Identifying Orthologous Genes 

[0113] Probes and primers of the invention can be used to identify and/or isolate polynucleotides related to those in 

25 REF and SEQ TABLES 1 AND 2. Related polynucleotides are those that are native to other plant organisms and exhibit 
either similar sequence or encode polypeptides with similar biological activity. One specific example is an orthologous 
gene. Orthologous genes have the same functional activity. As such, orthologous genes may be distinguished from 
homologous genes. The percentage of identity is a function of evolutionary separation and. in closely related species, 
the percentage of identity can be 98 to 100%. The amino acid sequence of a protein encoded by an orthologous gene 

30 can be less than 75% identical, but tends to be at Ieast75% or at least 80% identical, more preferably at least 90%, 
most preferably at least 95% identical to the amino acid sequence of the reference protein. To find orthologous genes, 
the probes are hybridized to nucleic acids from a species of interest under low stringency conditions, preferably one 
where sequences containing as much as 40-45% mismatches will be able to hybridize. This condition is established 
by T^ - 40''C to - 48''C (see below). Blots are then washed under conditions of increasing stringency. It is preferable 

35 that the wash stringency be such that sequences that are 85 to 1 00% identical will hybridize. More preferably, sequences 
90 to 100% identical will hybridize and most preferably only sequences greater than 95% identical will hybridize. One 
of ordinary skill in the art will recognize that, due to degeneracy in the genetic code, amino acid sequences that are 
identical can be encoded by DNA sequences as little as 67% identical or less. Thus, it is preferable, for example, to 
make an overiapping series of shorter probes, on the order of 24 to 45 nucleotides, and individually hybridize them to 

40 the same arrayed library to avoid the problem of degeneracy introducing large numbers of mismatches. 

[0114] As evolutionary divergence increases, genome sequences also tend to diverge. Thus, one of skill will recog- 
nize that searches for orthologous genes between more divergent species will require the use of lower stringency 
conditions compared to searches between closely related species. Also, degeneracy of the genetic code is more of a 
problem for searches in the genome of a species more distant evolutionarily from the species that is the source of the 

45 SDF probe sequences. 

[0115] The SDFs of the invention can also be used as probes to search for genes that are related to the SDF within 
a species. Such related genes are typically considered to be members of a gene family. In such a case, the sequence 
similarity will often be concentrated into one or a few fragments of the sequence. The fragments of similar sequence 
that define the gene family typically encode a fragment of a protein or RNA that has an enzymatic or structural function. 

50 The percentage of identity in the amino acid sequence of the domain that defines the gene family is preferably at least 
70%. more preferably 80 to 95%, most preferably 85 to 99%. To search for members of a gene family within a species, 
a low stringency hibridization is usually performed, but this will depend upon the size, distribution and degree of se- 
quence divergence of domains that define the gene family. SDFs encompassing regulatory regions can be used to 
identify coordinately expressed genes by using the regulatory region sequence of the SDF as a probe. 

55 [0116] In the instances where the SDFs are identified as t>eing expressed from genes that confer a particular phe- 
notype, then the SDFs can also be used as probes to assay plants of different species for those phenotypes. 
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I.e. Methods to Inhibit Gene Expression 

[0117] The nucleic acid molecules of the present invention can be used to inhibit gene transcription 
Example of such methods include, without limitation: 



Antisense Constructs; 
Ribozyme Constructs; 
Chimeraplast Constructs; 
Co-Suppression; 
Transcriptional Silencing; and 
Other Methods of Gene Expression. 



C.1 Antisense 



[01181 In some instances it is desirable to suppress expression of an endogenous or exogenous gene. A v"^"*'™*" 
Instance is the Fl^VOR-SAVOR" tomato, in which the gene encoding ACC synthase '^'"f '^^♦^,^1'^ ^" ^^^^^^^^^ 
aooroach thus delaying softening of the fruit after ripening. See for example. U.S. Patent No. 5.859 330, U.S. Patent 
a^ppreach tnus y g ^ 254:437-439(1991): and Hamilton at al. Nature. 346:284-287 (1990). Also. t.m,ng 

of flowering c^n be controlled by suppression of the FLOWERING LOCUS C (FLC); "^if 'e-;^ '^^^ Tpian Zll 
associated with late flowering, while absence of FLC is associated with eariy flowenng (S.D. Michaels et al Plant Cell 
1 1 -949 (1 999) Also, the transition of apical meristem from production of leaves with associated shoots to flowenng .s 
^igulated by TERMINAL FLOWER1. APETALA1 and LEAFY. Thus when it is desired toinduce a "l^^S'J^" IfjJ^^;'^* 
production to flowering, it is desirable to suppress TFL1 expression (S.J. Liljegren, Plant ^e^' Jl -1007 (19M))^As 
another instance arrested ovule development and female sterility result from suppression of the etbytene fomn ng 
eSmi but c^n be reversed by application of ethylene (D. De Martinis et al.. Plant Ce// 11:1061 (1999)). The ab-lrty 
to manipulate female fertility of plants is useful in increasing fmit production and creating hybrids. ,^„„,„^^ 
[01191 in the case of polynucleotides used to inhibit expression of an endogenous gene, the introduced sequence 
need not be perfectly identical to a sequence of the target endogenous gene. The introduced polynucleotide sequence 
will typically be at least substantially identical to the target endogenous sequence. „,„e«»H m 

m20] Some polynucleotide SDFs in REF and SEQ TABLES 1 AND 2 represent sequences that are expressed in 
Um.wheat. rice, soybean Arabidopsis and/or other plants. Thus the invention includes using these sequences to gen- 
erate antisense constructs to inhibH translation and/or degradation of transcripts of «f ^DFs. typ.caN^^^^^^ 
[01211 To accomplish this, a polynucleotide segment from the desired gene that can hybndize to the mRNA expre^ed 
from the desired gene (the antisense segment") is operably linlced to a promoter such that the antisense strand erf RNA 
Z be transcribed when the construct is present in a host cell. A regulated promoter can be used in the construct to 
control transcription of the antisense segment so that transcription occurs only ""^er desired circumstances. 
Joi22] The antisense segment to be introduced generally will be substantially identical o at ^^^'^^J^'^^'^ 
Endogenous gene or genes to be repressed. The sequence, however, need not be perfecUy 'dentical to '"^'brt expre^ 
sion Further the antisense product may hybridize to the untranslated region instead of or .n addrtion to the codmg 
sLquencl of the gene. The vectors of the present invention can be designed such that the inhibitory effect applies to 
other proteins within a family of genes exhibiting homology or substantial homology to the [^'f f 
[01231 For antisense suppression, the introduced antisense segment sequence also need not be full length relative 
o either the primary transcription product or the fully processed mRNA. Generally, a higher percentage of sequence 
Sentity can be used to compensate for the use of a shorter sequence. Furthem^ore, the '"t^<>^"^^/f^^^"^ "!?^^ 
not have the same intron or exon pattern, and homology of noncodlng segmente may be equally effective^ Normally 
a sequence of between about 30 or 40 nucleotides and the full length of the transcnpt can be used, though a sequence 
of at least about 100 nucleotides is preferred, a sequence of at least about 200 nucleotides is more preferred, and a 
sequence of at least about 500 nucleotides is especially prefen-ed. 



C.2. Ribozymes 



[0124] It is also contemplated that gene constructs representing ribozymes and based on the SDFs in REF AND 
SEQ TABLES 1 AND 2 are an object of the invention. Ribozymes can also be used to inhibit expression of genes by 
suppressing the translation of the mRNAintoapclypeptde.lt is possible to design ribozymes that s^^^^^ 
virtually any target RNA and cleave the phosphodiester backbone at a specific locatK,n. thereby funcUonally '"acbvabng 
the target RNA In carrying out this cleavage, the ribozyme is not itself altered, and is thus capable «f ^CV^'^S fi^ 
cleaving other molecules, making itatrue enzyme. The inclusion of ribozyme sequences wrthin antisense RNAs confers 
RNAcleaving activity upon them, thereby increasing the acUvity of the constructs. 
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[0125] A number of classes of ribozymes have been identified. One class of ribozymes is derived from a number of 
small circular RNAs, which are capable of setfclea vage and replication in plants. The RNAs replicate either alone (viroid 
RNAs) or with a helper virus (satellite RNAs). Examples include RNAs from avocado sunblotch viroid and the satellite 
RNAs from tobacco ringspot virus, lucerne transient streak virus, velvet tobacco mottle virus, solanum nodiflorum mottle 
5 virus and subten^nean clover mottle virus. The design and use of target RNAspecific ribozymes is described in Haseloff 
et al. Nature, 334:585 (1988). 

[0126] Like the antisense constructs above, the ribozyme sequence fragment necessary for pairing need not be 
identical to the target nucleotides to be cleaved, nor identical to the sequences in REF AND SEQ TABLES 1 AND 2. 
Ribozymes may be constructed by combining the ribozyme sequence and some fragment of the target gene which 

10 would allow recognition of the target gene mRNA by the resulting ribozyme molecule. Generally, the sequence in the 
ribozyme capable of binding to the target sequence exhibits a percentage of sequence identity with at least 80%, 
preferably with at least 85%. more preferably with at least 90% and most preferably with at least 95%. even more 
preferably, with at least 96%, 97%, 98% or 99% sequence identity to some fragment of a sequence in REF AND SEQ 
TABLES 1 AND 2 or the complement thereof. The ribozyme can be equally effective in inhibiting mRNA translation by 

15 cleaving either in the untranslated or coding regions. Generally, a higher percentage of sequence identity can be used 
to compensate for the use of a shorter sequence. Furthermore, the introduced sequence need not have the same 
intron or exon pattern, and homology of non-coding segments may be equally effective. 

C.3. Chimeraplasts 

20 

[01 27] The SDFs of the invention, such as those described by the REF and SEQ Tables, can also be used to construct 
chimeraplasts that can be introduced into a cell to produce at least one specific nucleotide change in a sequence 
corresponding to the SDF of the invention. A chimeraplast is an oligonucleotide comprising DNA and/or RNA that 
specifically hybridizes to a target region in a manner which creates a mismatched base-pair. This mismatched base- 
25 pair signals the cell's repair enzyme machinery which acts on the mismatched region resulting in the replacement, 
insertion or deletion of designated nucleotide(s). The altered sequence is then expressed by the cell's normal cellular 
mechanisms. Chimeraplasts can be designed to repair mutant genes, modify genes, introduce site^pecific mutations, 
and/or act to Interrupt or alter normal gene function (US Pat. Nos. 6,010,907 and 6,004,804; and PCT Pub. No. 
W099/58723 and WO99/07865). 

30 

C.4. Sense Suppression 

[0128] The SDFs of REF and SEQ TABLES 1 AND 2 of the present invention are also useful to modulate gene 
expression by sense suppression. Sense suppression represents another method of gene suppression by introducing 

35 at least one exogenous copy or fragment of the endogenous sequence to be suppressed. 

[0129] Introduction of expression cassettes in which a nucleic acid is configured in the sense orientation with respect 
to the promoter into the chromosome of a plant or by a self-replicating virus has been shown to be an effective means 
by which to induce degradation of mRNAs of target genes. For an example of the use of this method to modulate 
expression of endogenous genes see, Napoli et al.. The Plant Cell 2:279 (1990), and U.S. Patents Nos. 5,034.323. 

40 5,231 ,020, and 5,283,184. Inhibition of expression may require some transcription of the introduced sequence. 

[0130] For sense suppression, the introduced sequence generally will t>e substantially identical to the endogenous 
sequence intended to be inactivated. The minimal percentage of sequence identity will typically be greater than about 
65%, but a higher percentage of sequence identity might exert a more effective reduction in the level of normal gene 
products. Sequence identity of more than about 80% is preferred, though about 95% to absolute identity would be 

45 most preferred. As with antisense regulation, the effect would likely apply to any other proteins within a similar family 
of genes exhibiting homology or substantial homology to the suppressing sequence. 

C.5. Transcriptional Silencing 

50 [0131] The nucleic acid sequences of the invention, including the SDFs of REF and SEQ TABLES 1 AND 2, and 
fragments thereof, contain sequences that can be inserted into the genome of an organism resulting in transcriptional 
silencing. Such regulatory sequences need not be operatively linked to coding sequences to modulate transcription of 
a gene. Specifically, a promoter sequence without any other element of a gene can be introduced into a genome to 
transcriptionally silence an endogenous gene (see, for example, Vaucheret, H et al. (1998) The Plant Joumal 16: 

55 651-659). As another example, triple helices can be formed using oligonucleotides based on sequences from REF 
AND SEQ TABLES 1 AND 2, fragments thereof, and substantially similar sequence thereto. The oligonucleotide can 
be delivered to the host cell and can bind to the promoter in the genome to form a triple helix and prevent transcription. 
An oligonucleotide of interest is one that can bind to the promoter and block binding of a transcription factor to the 
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promoter. In such a case, the oligonucleotide can be complementary to the sequences of the promoter that interact 
with transcription binding factors. 

C.6. Other Methods to Inhibit Gene Expression 

[0132] Yet another means of suppressing gene expression is to insert a polynucleotide into the gene of interest to 

disruDt transcription or translation of the gene. ^ * kw fl^nirinn 

Sl33] Low f^quency homologous recombination can be used to target a polynucleo^de insert to a f"« t»y flanking 
Se polynucleotide insert with sequences that are substantially similar to the gene to be d-srupted. Sequenc^ from 
rIf AND SEQ TABLES 1 AND 2, fragments thereof, and substantially similar sequence thereto can be used for 

rCra« l^ntm insertion of potynucleotides into a host cell genome can also be used to <^isrupt the ^^^^^^^ 
of nVerest Azpiroz-Leehan et al., Trends in Genef/cs 13:152 (1997). In this method, screening for c ones from a ibrary 
intailgrTndominsertionsispreferredtoidentif^^ 

Sich screening can be performed using probes and/or primers described above based 

SEQ TABLES 1 AND 2. fragments thereof, and substantially similar sequence thereto. The screening can also be 
performed by selecting clones or plants having a desired phenotype. 

l.D. Methods of Functional Analysis 

[0135] The constructs described in the methods under I.C. above can be used to detem^ine the function of the 

DoivoeDtide encoded by the gene that is targeted by the constructs. 

m 361 DoL-regulating the transcription and translation of the targeted gene in the host cell or organ«ms. such as 
L iant may Produ^ Phenotypic changes as compared to a wild-type cell or organism. In addition. ,n vrtro assays «^n 
Se u^dTo detemiine rf any bLgical activity, such as calcium flux. DNA transcription, nucleotide incorporation, etc.. 

are beinq modulated by the down-regulation of the targeted gene. ... ^ 

ronV] Coordinated regulation of sets of genes, e.g.. those contributing to a desired polygenic t^^.L -s sometimes 
necessarvtoobtain a d2lred phenotype. SDFsof the invention representing transcription activation and DNA binding 
SoSns^a^be assembled into hybrid transcriptional activators. These hybrid transcriptional activators can be used 
SrrcorrSpondrg DNA elem'ents (i.e.. thL bound by the DNA-binding SDFs) to effect f^^'^--^^^-^^^. 
Tf desired gen^ (J.J. Schwarz et al.. Mo/. Ce//. B/o/. 12:266 (1992). A. Martinez et al.. Mo/. Gen. Genet. 261.546 

roi 38?' The SDFs of the invention can also be used in the two-hybrid genetic systems to identify networks of protein- 

et al J Biol Chem 274:36428 (1999). K. Ichimura et al., Biochem. Biophys. Res. Comm. 253.532 (1 998)) The SDPs 
S fhe tv^ntion in ^ be usei in various expression display methods to identify important protein-DNA interactions 
(e.g. B. Luo et al., J. Mol. Biol. 266:479 (1997)). 



I.E. Promoters 



[0139] The SDFS of the invention are also useful as structural or regulatory sequences in a '^'^^^^^^"^^^^l 
the expression of the corresponding gene in a plant or other organism, e.g. a symbiotic bactenum. For example, pro- 
rJot'T^uTnces associated' to SDFs of REF and SEQ TABLES 1 AND ^ of the Present mven.^^^^^^ 
directing expression ofcoding sequences eitheras constitutive promoters orto direct expressionin particular celltypes. 

ti<?suGs or oraans or in response to environmental stimuli. * 
0140] Wrthrespect to the SDFs of the present invention a promoter is likely to be Vf* '""'^,^'".1^ in 
Senomic DNA (gDNA) sequence located in the first 2000 nucleotides upstream from an ini^al exon identified in a gDNA 
sequer^L or inmal ATG" or methionine codon or translational start site In a corresponding cDNA sequence. Such 
oSmotii are more likely to be found in the first 1000 nucleotides upstream of an initial ATG or methionine codon or 
STi^Eal stTrt Ste i a cDNA sequence corresponding to a gDNA sequence. In particular, the promoter is usual y 
SSupi:^S*f.^nscription'startsite.Thefragment^ 

oTJprorrSrter in a plant cel. will preferably be found to hybridize to gDNA sequences P^^7^,*',,^"^„^;^'="^^^^^ 
and REF AND SEQ TABLES 1 AND 2 at medium or high stringency, relevant to the length of the probe and its base 

[oTir'promoters are generally modular in nature. Promoters can consist of a basal P~'"°»«^*«*f""«^'°';^f 
or asiembly of a transcription complex comprising an RNA polymerase, for example RNA polymerase II. A typ cal 
£S.«on dpiex will include additional factors such as TF„B. TF„D. and TF.,E. Of these. TF„D apf^ar. to bethe 
^^one to bind DNAdirecMy. The promoter mightalsocontalnone or more enhancersand/or suppressors that fun 
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as binding sites for additional transcription factors that have the function of modulating the level of transcription with 
respect to tissue specificity and of transcriptional responses to particular environmental or nutritional factors, and the 
like. 

[01 42] Short DNA sequences representing binding sites for proteins can be separated from each other by intervening 
5 sequences of varying length. For example, within a particular functional module, protein binding sites may be constituted 
by regions of 5 to 60, preferably 10 to 30, more preferably 10 to 20 nucleotides. Within such binding sites, there are 
typically 2 to 6 nucleotides that specifically contact amino acids of the nucleic acid binding protein. The protein binding 
sites are usually separated from each other by 10 to several hundred nucleotides, typically by 15 to 150 nucleotides, 
often by 20 to 50 nucleotides. DNA binding sites in promoter elements often display dyad symmetry in their sequence. 
10 Often elements binding several different proteins, and/or a plurality of sites that bind the same protein, will be combined 
in a region of 50 to 1 .000 basepairs. 

[0143] Elements that have transcription regulatory function can be isolated from their corresponding endogenous 
gene, or the desired sequence can be synthesized, and recombined in constructs to direct expression of a coding 
region of a gene in a desired tissue-specific, temporal-specific or other desired manner of inducibility or suppression. 

15 When hybridizations are performed to identify or isolate elements of a promoter by hybridization to the long sequences 
presented in REF AND SEQ TABLES 1 AND 2, conditions are adjusted to account for the above-described nature of 
promoters. For example short probes, constituting the element sought, are preferably used under low temperature 
and/or high salt conditions. When long probes, which might include several promoter elements are used, low to medium 
stringency conditions are preferred when hybridizing to promoters across species. 

20 [0144] If a nucleotide sequence of an SDF, or part of the SDF, functions as a promoter or fragment of a promoter, 
then nucleotide substitutions, insertions or deletions that do not substantially affect the binding of relevant DNA binding 
proteins would be considered equivalent to the exemplified nucleotide sequence. It is envisioned that there are in- 
stances where it is desirable to decrease the binding of relevant DNA binding proteins to silence or downn-egulate a 
promoter, or conversely to increase the binding of relevant DNA binding proteins to enhance or up-regulate a promoter 

25 and vice versa. In such instances, polynucleotides representing changes to the nucleotide sequence of the DNA-protein 
contact region by insertion of additional nucleotides, changes to identity of relevant nucleotides, including use of chem- 
ically-modified bases, or deletion of one or more nucleotides are considered encompassed by the present invention. 
In addition, fragments of the promoter sequences described by the REF and SEQ Tables and variants thereof can be 
fused with other promoters or fragments to facilitate transcription and/or transcription in specific type of cells or under 

30 specific conditions. 

[0145] Promoter function can be assayed by methods known in the art, preferably by measuring activity of a reporter 
gene operatively linked to the sequence being tested for promoter function. Examples of reporter genes include those 
encoding luciferase, green fluorescent protein, GUS, neo, cat and bar. 

35 i.F. UTRs and Junctions 

[0146] Polynucleotides comprising untranslated (UTR) sequences and intron/exon junctions are also within the scope 
of the invention. UTR sequences include introns and 5* or 3' untranslated regions (5' UTRs or 3' UTRs). Fragments of 
the sequences shown in REF AND SEQ TABLES 1 AND 2 can comprise UTRs and intron/exon junctions. 
40 [0147] These fragments of SDFs, especially UTRs, can have regulatory functions related to, for example, translation 
rate and mRNA stability. Thus, these fragments of SDFs can be isolated for use as elements of gene constructs for 
regulated production of polynucleotides encoding desired polypeptides. 

[0148] Introns of genomic DNA segments might also have regulatory functions. Sometimes regulatory elements, 
especially transcription enhancer or suppressor elements, are found within introns. Also, elements related to stability 
45 of heteronuclear RNA and efficiency of splicing and of transport to the cytoplasm for translation can be found in intron 
elements. Thus, these segments can also find use as elements of expression vectors intended for use to transform 
plants. 

[0149] Just as with promoters UTR sequences and intron/exon junctions can vary from those shown in REF AND 
SEQ TABLES 1 AND 2. Such changes from those sequences preferably will not affect the regulatory activity of the 
50 UTRs or intron/exon junction sequences on expression, transcription, or translation unless selected to do so. However, 
in some instances, down- or up-regutation of such activity may be desired to modulate traits or phenotypic or in vitro 
activity. 

I.G. Coding Sequences 

55 

[01 50] Isolated polynucleotides of the invention can include coding sequences that encode polypeptides comprising 
an amino acid sequence encoded by sequences in REF AND SEQ TABLES 1 AND 2 or an amino acid sequence 
presented in REF AND SEQ TABLES 1 AND 2. 
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[01511 A nucleotide sequence encodes a polypeptide if a cell (or a cell free in vitro system) expressing that nucleotide 
Lauence oroduces a polypeptide having the recited amino acid sequence when the nucleotide sequence is transcribed 
and the primary transcript is subsequently processed and translated by a host cell (or a cell tree m w,ro sy^.e...; na,- 
boring the nucleic acid. Thus, an isolated nucleic acid that encodes a particular amino acid sequence can be a genomic 
sequence comprising exons and introns or a cDNA sequence that represents the product of splicing thereof. An isolated 
nucleic acid encoding an amino acid sequence also encompasses heteronuclear RNA, which contains sequences that 
are spliced out during expression, and mRNA, which lacks those sequences. 

[01 52] Coding sequences can be constructed using chemical synthesis techniques or by isolating coding sequences 
or by modifying such synthesized or isolated coding sequences as described above^ 

[01531 In addition to coding sequences encoding the polypeptide sequences of REF AND SEQ TABLES 1 anu ^, 
which are native to corn. Arabidopsis. soybean, rice, wheat, and other plants the isolated polynucleotides can be 
polynucleotides that encode variants, fragments, and fusions of those native proteins. Such polypeptides are descnbed 

[oIm]'" I'n variant polynucleotides generally the number of substitutions, deletions or insertions is preferably less than 
20% more preferably less than 15%; even more preferably less than 10%. 5%. 3% or 1% of the number of nucleotides 
comprising a particulariy exemplified sequence. It Is generally expected that non^egenerate nudeotide sequence 
changes that result in 1 to 10. more preferably 1 to 5 and most preferably 1 to 3 amino acid insertions, deletions or 
substitutions will not greatly affect the function of an encoded polypeptide. The most preferred embodiments are those 
wherein 1 to 20. preferably 1 to 10. most preferably 1 to 5 nucleotides are added to. deleted from and/or substituted 
In the sequences specifically disclosed In REF AND SEQ TABLES 1 AND 2. 

[0155] Insertions or deletions in polynucleotides intended to be used for encoding a polypeptide preferably Preserve 
the reading frame. This consideration is not so important in instances when the polynucleotide is intended to be used 
as a hybridization probe. 

II. Polypeptides and Proteins 

IIA. Native polypeptides and proteins 

[0156] Polypeptides within the scope of the invention include both native proteins as well as variants^ fragments, 
and fusions thereof. Polypeptides of the invention are those encoded by any of the six reading frames of sequences 
shown In REF AND SEQ TABLES 1 AND 2. preferably encoded by the three frames reading in the 5 to 3 direction of 
the sequences as shown. . . . 

[0157] Native polypeptides include the proteins encoded by the sequences shown in REF AND SEQ TABLES 1 AND 
2 Such native polypeptides include those encoded by allelic variants. 

[01581 Polypeptide and protein variants will exhibit at least 75% sequence identity to those native polypeptides of 
REF AND SEQ TABLES 1 AND 2. More preferably, the polypeptide variants will exhibit at least 85% sequence identity; 
even more preferably, at least 90% sequence identity: more preferably at least 95%, 96%. 97%, 98%. or 99% sequen,^ 
Identity. Fragments of polypeptide or fragments of polypeptides will exhibit similar percentages of sequence 'dentily to 
the relevant fragments of the native polypeptide. Fusions will exhibit a similar percentage of sequence identity in that 
fragment of the fusion represented by the variant of the native peptide. ^,^-.„ 
[0159] Furthermore, polypeptide variants will exhibit at least one of the functional properties of the native protein. 
Such properties include, without limitation, protein Interaction, DNA interaction, biological activity, immunological ac- 
tivity, receptor binding, signal transducUon. transcription activity, growth factor activity, secondary structure three-di- 
mensional structure, etc. As to properties related to in vitro or ,7, v,Vo activities, the variants preferably exhibit at least 
60% of the activity of the native protein; more preferably at least 70%. even more preferably at least 80 /o, 85 /o. 90 ^ 
or 95% of at least one activity of the native protein. 

[01 60] One type of variant of native polypeptides comprises amino acid substitutions, deletions and/or insertions. 
Conservative substitutions are preferred to maintain the function or activity of the polypeptide. 
[0161] Within the scope of percentage of sequence identity described above, a polypeptide of the invention may 
have additional individual amino acids or amino acid sequences inserted into the polypeptide in the middle thereof 
and/or at the N-temilnal and/or C-terminal ends thereof. Likewise, some of the amino acids or amino acid sequences 
may be deleted from the polypeptide. 

A.1 Antibodies 

[01 621 Isolated polypeptides can be utilized to produce antibodies. Polypeptides of the invention can generally be 
used, for example, as antigens for raising antibodies by known techniques. The resulting antibodies are useful as 
reagents for detemiining the distribution of the antigen protein within the tissues of a plant or within a cell of a plant 
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The antibodies are also useful for examining the production level of proteins in various tissues, for example in a wild- 
type plant or following genetic manipulation of a plant, by methods such as Western blotting. 

[01 63] Antibodies of the present invention, both polyclonal and monoclonal, may be prepared by conventional meth- 
ods. In general, the polypeptides of the invention are first used to immunize a suitable animal, such as a mouse, rat, 

5 rabbit, or goat. Rabbits and goats are preferred for the preparation of polyclonal sera due to the volume of serum 
obtainable, and the availability of labeled anti-rabbit and anti-goat antibodies as detection reagents. Immunization is 
generally performed by mixing or emulsifying the protein in saline, preferably in an adjuvant such as Freund's complete 
adjuvant, and injecting the mixture or emulsion parenterally (generally subcutaneously or intramuscularly). A dose of 
50-200 ^g/injection is typically sufficient. Immunization is generally boosted 2-6 weeks later with one or more injections 

10 of the protein in saline, preferably using Freund's incomplete adjuvant. One may altematively generate antibodies by 
in vitro immunization using methods known in the art, which for the purposes of this invention is considered equivalent 
to in vivo immunization. 

[0164] Polyclonal antisera is obtained by bleeding the immunized animal into a glass or plastic container, incubating 
the blood at 25°C for one hour, followed by incubating the blood at 4°C for 2-18 hours. The serum is recovered by 

15 centrifugation (e.g., 1 .OOOxg for 10 minutes). About 20-50 ml per bleed may be obtained from rabbits. 

[0165] Monoclonal antibodies are prepared using the method of Kohler and Milstein, Nature 256: 495 (1975), or 
modification thereof. Typically, a mouse or rat is immunized as described above. However, rather than bleeding the 
animal to extract serum, the spleen (and optionally several large lymph nodes) is removed and dissociated into single 
cells. If desired, the spleen cells can be screened (after removal of nonspecifically adherent cells) by applying a cell 

20 suspension to a plate, or well, coated with the protein antigen. B-cells producing membrane-t>ound immunoglobulin 
specific for the antigen bind to the plate, and are not rinsed away with the rest of the suspension. Resulting B-cells, or 
all dissociated spleen cells, are then induced to fuse with myeloma cells to form hybhdomas, and are cultured in a 
selective medium (e.g., hypoxanthine, aminopterin, thymidine medium, HAT"). The resulting hybhdomas are plated by 
limiting dilution, and are assayed for the production of antibodies which bind specifically to the immunizing antigen 

25 (and which do not bind to unrelated antigens). The selected Mab-secreting hybhdomas are then cultured either//? vitro 
(e.g.. in tissue culture bottles or hollow fiber reactors), or in vivo (as ascites in mice). 

[0166] Other methods for sustaining antibody-producing B-cell clones, such as by EBV transfomiation. are known. 
[0167] If desired, the antibodies (whether polyclonal or monoclonal) may be labeled using conventional techniques. 
Suitable labels include fluorophores, chromophores, radioactive atoms (particularly ^^p and ''^^l), electron-dense re- 
30 agents, enzymes, and ligands having specific binding partners. Enzymes are typically detected by their activity. For 
example, horseradish peroxidase is usually detected by its ability to convert 3,3*.5.5 -tetramethylbenzidine (TNB) to a 
blue pigment, quantifiable with a spectrophotometer. 

A.2 In Vitro Applications of Polypeptides 

35 

[0168] Some polypeptides of the invention will have enzymatic activities that are useful in vitro. For example, the 
soybean trypsin inhibitor (Kunitz) family is one of the numerous families of proteinase inhibitors. It comprises plant 
proteins which have inhibitory activity against serine proteinases from the trypsin and subtilisin families, thiol protein- 
ases and aspartic proteinases. Thus, these peptides find in vitro use in protein purification protocols and perhaps in 

40 therapeutic settings requiring topical application of protease inhibitors. 

[0169] Delta-aminolevulinic acid dehydratase (EC 4.2.1.24) (ALAD) catalyzes the second step in the biosynthesis 
of heme, the condensation of two molecules of 5-aminolevulinate to form porphobilinogen and is also involved in chlo- 
rophyll biosynthesis(Kaczor et al. (1994) Plant Physiol. 1-4: 1411-7; Smith (1988) Biochem. J. 249: 423-8; Schneider 
(1976) Z. naturtorsch. [C] 31 : 55-63). Thus. ALAD proteins can be used as catalysts in synthesis of heme derivatives. 

45 Enzymes of biosynthetic pathways generally can be used as catalysts for in vitro synthesis of the compounds repre- 
senting products of the pathway. 

[0170] Polypeptides encoded by SDFs of the invention can be engineered to provide purification reagents to identify 
and purify additional polypeptides that bind to them. This allows one to identify proteins that function as multimers or 
elucidate signal transduction or metat>olic pathways. In the case of DNA binding proteins, the polypeptide can be used 
50 in a similar manner to identify the DNA determinants of specific binding (S. Pierrou et al., Ana/. Biochem. 229:99 (1 995), 
S. Chusacultanachai et al., J. Biol. Chem. 274:23591 (1999), Q. Lin et al.. J. Biol. Chem. 272:27274 (1997)). 

11. B . POLYPEPTIDE VARIANTS . FRAGMENTS. AND FUSIONS 

55 [0171] Generally, variants , fragments, or fusions of the polypeptides encoded by the maximum length sequence 
(MLS) can exhibit at least one of the activities of the identified domains and/or related polypeptides described in Sections 
(C) and (D) of REF TABLES 1 and 2 corresponding to the MLS of interest. 
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II.B .(1) Variants 

[01721 A type of variant of the native polypeptides comprises amino acid substitutions. Conservative subsliiuiio, ,s, 
described above (see II.), are preferred to maintain the function oractivity of the polypeptide. Such substitutions include 
conservation of charge, polarity, hydrophobicity, size. etc. For example, one or more amino acid residues within the 
sequence can be substituted with another amino acid of similar polarity that acts as a functional equivalent, for example 
providing a hydrogen bond in an enzymatic catalysis. Substitutes for an amino acid within an exemplified sequence 
are preferably made among the members of the class to which the amino acid belongs. For example, the nonpolar 
(hydrophobic) amino acids include alanine, leucine, isoleucine. valine, proline, phenylalanine, tryptophan and methio- 
nine The polar neutral amino acids include glycine, serine, threonine, cysteine, tyrosine, asparagine. and glutamine. 
The positively charged (basic) amino acids include arginine, lysine and histidine. The negatively charged (acidic) ammo 

acids include aspartic acid and glutamic acid. 

[0173] Within the scope of percentage of sequence identity described above, a polypeptide of the 'nvention may 
have additional individual amino acids or amino acid sequences inserted into the polypeptide in the middle thereof 
and/or at the N-temiinal and/or C-tenninal ends thereof. Likewise, some of the amino acids or amino acid sequences 
may be deleted from the polypeptide. Amino acid substitutions may also be made in the sequences; conservative 

substitutions being prefen^ed. ^ j , ♦ j ,.„rf/„r 

[0174] One preferred class of variants are those that comprise (1 ) the domain of an encoded polypeptide and/or (2) 
residues conserved between the encoded polypeptide and related polypeptides. For this class of variants, theencoded 
polypeptide sequence is changed by insertion, deletion, or substitution at positions flanking the domain and/or con- 

[0175] '^Another class of variants includes those that comprise an encoded polypeptide sequence that is changed in 
the domain or conserved residues by a conservative substitution. 

[01 76] Yet another class of variants includes those that lack one of the in vitro activities, or structural features of the 
encoded polypeptides. One example is polypeptides or proteins produced from genes comprising dominant negative 
mutations. Such a variant may comprise an encoded polypeptide sequence with non-conservative changes m a par- 
ticular domain or group of correerved residues. 

II.A.(2) FRAGMENTS 

[0177] Fragments of particular interest are those that comprise a domain identified for a polypeptide encoded by an 
MLS of the instant invention and variants thereof. Also, fragments that comprise at least one region of residues con- 
served between an MLS encoded polypeptide and its related polypeptides are of great interest. Fragments are some- 
times useful as polypeptides con-esponding to genes comprising dominant negative mutations are. 

ll.A.(3)FUSIONS 

[0178] Of interest are chimeras comprising (1) a fragment of the MLS encoded polypeptide or variants thereof of 
interest and (2) a fragment of a polypeptide comprising the same domain. For example, an AP2 helix enojded by a 
MLS of the invention fused to second AP2 helix from ANT protein, which comprises two AP2 f'e'"^®^- "^^ f ™ 
invention also encompasses fusions of MLS encoded polypeptides, variants, or fragments thereof fused with related 
proteins or fragments thereof. 

DEFINITIGN OF DOMAINS 

[0179] The polypeptides of the invention may possess identifying domains as shown in REF TABLES 1 and 2Specific 
domains within the MLS encoded polypeptides are indicated by the reference REF TABLES 1 and 2. In addition, the 
domains within the MLS encoded polypeptide can be defined by the region that exhibits at least 70% sequence identity 
with the consensus sequences listed in the detailed description below of each of the domains. 
[01 80] The majority of the protein domain descriptions given below are obtained from Prosite, 
(http//www.expasy.ch/prosite/). and Pfam, 
(http//pfam.wustl.edu/browse.shtml). 

1. (AAA) AAA-protein family signature 

[01 81] A large family of ATPases has been described [1 to 5] whose key feature is that they share a conserved region 
of about 220 amino acids that contains anATP-binding site. This family is now called AAA. for 'ATPases 'A ssociated 
with diverse cellular "A'ctivities. The proteins that belong to this family either contain one or two AAA domains. Proteins 
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containing two AAA domains: 

Mammalian and drosophila NSF (N-ethylmaleimide-sensitive fusion protein) and the fungal homotog, SEC18. 
These proteins are involved in intracellular transport between the endoplasmic reticulum and Golgi. as well as 
5 between different Golgi cisternae. 

Mammalian transitional endoplasmic reticulum ATPase (previously known as p97 or VCP) which is involved in the 
transfer of membranes from the endoplasmic reticulum to the golgi apparatus. This protein forms a ring-shaped 
homooligomer composed of six subunits. The yeast homolog is CDC48 and it may play a role in spindle pole 
proliferation. 

10 - Yeast protein PAS1 , essential for peroxisome assembly and the related protein PAS1 from Pichia pastoris. 
Yeast protein AFG2. 

Sulfolobus acidocaldarius protein SAV and Halobacterium salinarium cdcH which may be part of a transduction 
pathway connecting light to cell division. 

15 [0182] Proteins containing a single AAA domain: 

Escherichia coli and other bacteria ftsH (or hflB) protein. FtsH is an ATP-dependent zinc metallopeptidase that 
seems to degrade the heat-shock sigma-32 factor. 

20 [01 83] It is an integral membrane protein with a large cytoplasmic C-terminal domain that contain both the AAA and 
the protease domains. 

Yeast protein YME1, a protein important for maintaining the integrity of the mitochondrial compartment. YME1 is 
also a zinc-dependent protease. 
25 . Yeast protein AFG3 (or YTA10). This protein also seems to contain a AAA domain followed by a zinc-dependent 
protease domain. 

[0184] Subunits from the regulatory complex of the 26S proteasome [6] which is involved in the ATP-dependent 
degradation of ubiquitinated proteins: 

30 

a) Mammalian subunit 4 and homologs in other higher eukaryotes. In yeast (gene YTA5) and fission yeast (gene 
mts2). 

b) Mammalian subunit 6 (TBP7) and homologs In other higher eukaryotes and In yeast (gene YTA2). 

c) Mammalian subunit 7 (MSS1) and homologs in other higher eukaryotes and in yeast (gene CIM5 or YTA3). 
35 d) Mammalian subunit 8 (P45) and homologs in other higher eukaryotes and in yeast (SUG1 or CIM3 or TBY1) 

and fission yeast (gene letl). 

[0185] Other probable subunits such as human TBP1 which seems to influences HIV gene expression by interacting 
with the virus tat transactivator protein and yeast YTA1 and YTA6. 

40 

Yeast protein BCS1 , a mitochondrial protein essential for the expression of the Rieske iron-sulfur protein. 
Yeast protein MSP1 , a protein involved in intramitochondrial sorting of proteins. 

Yeast protein PAS8. and the corresponding proteins PASS from Pichia pastoris and PAY4 from Yarrowia lipotytica. 
Mouse protein SKD1 and Its fission yeast homolog (SpAC2G11.06). 
4S . Caenorhabditis elegans meiotic spindle formation protein mei-1. 
Yeast protein SAP 1. 
Yeast protein YTA7. 

Mycobacterium leprae hypothetical protein A2126A. 

50 [0186] It Is proposed that, In general, the AAA domains in these proteins act as ATP-dependent protein clamps [5]. 
In addition to the ATP-btnding 'A' and 6* motifs, which are located in the N-terminal half of this domain, there Is a highly 
conserved region located in the central part of the domain which was used to develop a signature pattern. 
Consensus pattem: [LIVMTJ-x-[LlVMT|-[LIVMF]-x-[GATMC]-[ST|-[NS]-x(4HLIVM]-D-x-A-(LIFA].x-R 

55 [1] Froehlich K.-U.. Fries H.W.. Ruediger M,. Erdmann R.. Botstein D., Mecke D. J. Cell Biol. 114:443-453(1991). 

[2] Erdmann R.. Wlebel F.F., Flessau A.. Rytka J., Beyer A., Froehlich K.-U.. Kunau W.-H. Cell 64:499-510(1991). 
[3] Peters J.-M.. Walsh M.J., Franke W.W. EMBO J. 9:1757-1767(1990). 

[4] Kunau W.-H., Beyer A., Goette K., Marzioch M., Saidowsky J.. Skaletz-Rorowski A., Wiebel F.F. Biochimie 75: 
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S ConJilS R. Duguet M. BioEssays 17:639-650(1995).! 6] Hilt W.. Wolf D.H. Trends Biochem. Sci. 21:96-102 
(1996). 

2. ABC Membrane (ABC transporter transmembrane region). This family represents a unrt of sbc t^^nsmernbrane 
helices. Many members of the ABC transporter family (ABC tran)have two such regrans. See also descriptions of 
ABCTra n. below, and ABC2 membrane, above. 

3. (ABC Tran) 

ABC transporters family signature 

[01 871 On the basis of sequence similarities a family of related ATP-bindingproteins has been characterized [1 to 51 
Vhese proteins are associated with avariety of distinct biological processes in both proteryotes and e"^^!^;^ am 
majority of them are involved in active transport of small hydrophilic molecules across the .q^op lasmic -"embrane^ AH 
these proteins share a conserved domain of some two hundred amino acid residues, which includes an ATP-bind ng 
s^trT^ese proteins are collectively known as ABC transporters. Proteins known to belong to this family are I sted 
below (references are only provided for recently detemiined sequences).ln prokaryotes -^Active *:ansport systern^ 
components: alkylphosphonate uptake(phnC/phnK/ phnL); arabinose (araG); arginine (artP); ^'^^^^^1^^^°^^^°^ 
dppF); ferric enterobactin (fepC); ferrichrome (fhuC): galactoside (mglA); 9'"^'"'"« 9 f /• ^ y-^^^'^-^ J" " ^^^^^^^ 
pC): glycine betaine/L-proline (proV); glutamate/aspatate (gItL): histidine (hisP^ .ron( II) (sfuC). '^-^J^') /f^^^^^ 
lactose (lacK); leucine/isoleucine/valine (braF/braG;livF/livG): maltose (malK); molybdenum (modC). "'<=tel (nikD 
nikE); ol gopeptide (amiE/amiF:oppD/oppF); peptide (sapD/sapF); phosphate (pstB); P"t.^«««"«jP°«^>! "''"fj 
spermidine/putresclne (potA); sulfate (cysA); vrtamin B12 (btuD). - Hemolysin/leukotoxin export P^^'^'"^ JJVf ■ ''^.^^^ 
7n6 IktB. - Colicin V export protein cvaB. - Lactococcin export protein IcnC [6]. - Lantibiotic fransport P™te'ns n.sT 
?nisin) and spaT (subti.in). - Extracellular proteases B and C export protein prtD. - Alkaline protease secret on prc^ein 
aprO - BetM1 2)-glucan export proteins chvA and ndvA. - Haemophilus influenzae capsule-polysacchande export 
p^em bel - cVichrome c biogenesis proteins ccmA (also known as cycV and helA). - Polysialic a-d 
protein kpsT - Cell division associated ftsE protein (function unknown). - Copper processing protein nosF from Pseu- 
domonas stutzeri. - Modulation protein nod! from Rhizobium (function unknown). - Eschenchia col. P™*f' "^^Vf ^"^ 
cydD - Subunit A of the ABC excision nuclease (gene uvrA). - Erythromycin resistance protein ^o"" Staphylo^ccus 
epLrmidis (gene msrA). - Tylosin resistance protein from Streptomyces fradiae (gene tIrC) [7]- - Heterocyst differen- 
tiation protein (gene hetA) from Anabaena PCC 7120. - Protein P29 from Mycoplasma hyorhinis. a probable "'"'Ponem 
™gh affinity transport system. - yhbG. a putative protein whose gene is linked with ntrA m mariy bactena such as 
Esch^fchircoli. Kleb^;ella'pneumoniae. Pseudomonas putida, Rhizobium ^^^^^^^^ 
Escherichia coli and related bacteria hypothetical proteins yabJ. yadG. yagC. Y^bA ycjW, yddA, Y^hX yejR yheS 
vhiG yhiH yjcW, yjjK, yoji. yrbF and ytfR.ln eukaryotes: - The multidrug transporters (Mdr) (P-glycoprote,n) a family 
J ctsely ied prote^s Jhich extrude a wide variety of doigs out of the cell (for a review see [8]). - Cystu= fibrosis 
transmembrane conductance regulator (CFTR). which is most probably involved in the transport of chloride ions. - 
Sen X'de^'nsport^^ 1 (T^1, PSF1. R1NG4, HAM-1. mtpl) and 2 {TAP2. PSF2, '''NGII. HAM-2- m^^^^^^^^ 
are fnvoLd in the transport of antigens from the cytoplasm to a membrane-bound compartment for 
MHC class I molecules. -70 Kd peroxisomal membrane protein (PMP70). - ALDP. a peroxisomal protein involved in 
>Sn^^adrenoleukodystrophy[91.-Sulfonylureareceptor[101,aputative subunit of the B^en 
^h^nT- DrS«,phi.a U^JlL (w) and brown (bw), which are involved in the import 7-^»'^--J-^^^^^^^^ 
pigments - Fungal elongation factor 3 (EF-3). - Yeast STE6 which is responsible for the export of the a-factor pherom- 
one Yeast mitochondrial transporter ATM1 . - Yeast MDL1 and MDL2. - Yeast SNQ2. - l^f^PJ'"''^;^";.;^'^^^^ 
protein (gene PDR5 or STS1 or YDR1). - Fission yeast heavy metal tolerance protein hmtl. This protein is protobly 
involved in the transport of metal-bound phytochelatins. - Fission yeast ^refeldin A |^sjstance protein (ge^^^ 
hba2). - Fission yeast leptomycin B resistance protein (gene pmdl). - mbpX. a hypothetical '^'^'^^P^^* P^°f 
Liverwort. - Prestelk-specific protein tagB from slime mold. This protein consists of two dorriains: a N-term.nalsubWase 
catalytic domain and a C-lerminal ABC transporter domain.As a signature pattern forthis class of proteins, a conserved 
region which is located between the W and the 'B' motifs of the ATP-binding site was used ,„„^„^.rAVMi 
roi881 Consensus pattern- [LIVMFYCHSAHSAPGLVFYKQH]-G-[DENQMW]-[KRQASPCLIMFW1-[KRNQSTAVM]- 
KF^SlVMHUVM^^^^^^^^ SAGCLIVPHFYWHPHKRHPHLIVMFYWSTA] The ATP-binding re- 

SSr^s dup^ cLted in araG, mdl, msrA, rbsA, tlrC, uvrA. yejF, Mdr-s. CFTR. pmdl and in EF-3. In some of those proteins^ 
fhe above pattem only detect one of the two copies of the domain. The proteins belonging to this family also contain 
one or two copies of the ATP-binding motifs 'A* and B*. 
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[ 1] Higgins C.R. Hyde S.C., Mimmack M.M.. Gileadi U.. Gill D.R., Gallagher M.P. J. Bioenerg. Biomembr. 22: 
571-592(1990). 

[ 2] Higgins C.F., Gallagher M.P.. Mimmack M.M.. Pearce S.R. BioEssays 8:111-116(1988). 
[ 3] Higgins C.F.. Hiles I.D.. Salmond G.P.C.. Gill D.R., Downie J.A., Evans I.J.. Holland I.B.. Gray L., Buckels S. 
5 D., Bell A.W.. Hemiodson M.A. Nature 323:448-450(1986). 

(4) Doolittle R.F., Johnson M.S., Husain I., van Houten B., Thomas D.C., Senear A. Nature 323:451-453(1986). 
[ 5) Blight M.A., Holland I.B. Mol. Microbiol. 4:873-880(1990). 

[ 6) Stoddard G.W., Petzel J.R. van Belkum M.J., Kok J.. McKay L.L. Appl. Environ. Microbiol. 58:1952-1961(1992). 
[7] Rosteck P.R. Jr., Reynolds P.A.. Hershberger C.L. Gene 102:27-32(1991). 
10 [8] Gottesman M.M.. Pastan I. J. Biol. Chem. 263:12163-12166(1988). 

[ 9] Valle D.. Gaertner J. Nature 361:682-683(1993). 

[10] Aguilar-Bryan L.. Nichols C.G., Wechsler S.W.. Clement J.P. IV. Boyd A.E. 111, Gonzalez G., Herrera-Sosa H., 
Nguy K.. Bryan J.. Nelson D.A. Science 268:423-426(1995). 

15 4. (ACBP) 

Acyl-CoA-binding protein signature 

[0189] Acyl-CoA-binding protein (ACBP) is a small (10 Kd) protein that binds medium- and long-chain acyl-CoA 
20 esters with very high affinity and may function as an intracellular carrier of acyl-CoA esters [1]. ACBP is also known 
as diazepam binding inhibitor (DBI) or endozepine (EP) because of its ability to displace diazepam from the benzodi- 
azepine (BZD) recognition site located on the GABA type A receptor. It is therefore possible that this protein also acts 
as a neuropeptide to modulate the action of the GABA receptor [2] .ACBP is a highly conserved protein of about 90 
residues that has been so far found in vertebrates, insects and yeast. ACBP is also related to the N-terminal section 
25 of a probable transmembrane protein of unknown function whichhas been found in mammals. As a signature pattern, 
the region that corresponds to residues 19 to 37 in mammalian ACBP was selected. 
Consensus pattern: P-[STA]-x-[DEN]-x-[LIVMF]-x(2)-[LIVMFY)-Y-[GSTA]-x-[FY]-K-Q-[STA](2)-x-G- 

[ 1] Rose T.M., Schultz E.R., Todaro G.J. Proc. Natl. Acad. Sci. U.S.A. 89:11287-11291(1992). 
30 [ 2] Costa E., Guidotti A. Life Sci. 49:325-344(1 991 ). 

5. (AIRS) 

AIR synthase related proteins 

35 

[0190] This family includes Hydrogen expression/formation protein HypE. AIR synthases, FGAM synthase and se- 
lenide, water dikinase. 

6. (AMP-binding) 

40 

Putative AMP-binding domain signature 

[0191] It has been shown [1 to 5] that a number of prokaryotic and eukaryotic enzymes which all probably act via an 
ATP-dependent covalent binding of AMP to their substrate, share a region of sequence similarity. These enzymes are: 

45 - Insects luciferase (luciferin 4-monooxygenase). Luciferase produces light by catalyzing the oxidation of luciferin in 
presence of ATP and molecular oxygen. - Alpha-aminoadipate reductase from yeast (gene LYS2). This enzyme cata- 
lyzes the activation of alpha-aminoadipate by ATP-dependent adenylation and the reduction of activated alpha-ami- 
noadipate by NADPH. - Acetate-CoA ligase (acetyl-CoA synthetase), an enzyme that catalyzes the formation of acetyl- 
CoA from acetate and CoA. - Long-chain-fatty-acid~CoA ligase, an enzyme that activates long-chain fatty acids for 

50 both the synthesis of cellular lipids and their degradation via beta-oxidation.-4-coumarate— CoA ligase (4CL), a plant 
enzyme that catalyzes the formation of 4-coumarate-CoA from 4-coumarate and coenzyme A; the branchpoint reactions 
between general phenylpropanoid metabolism and pathways leading to various specific end products. - O-succinyl- 
benzoic acid~CoA ligase (OSB-CoA synthetase) (gene menE) [6], a bacterial enzyme involved in the biosynthesis of 
menaquinone (vitamin K2). - 4-Chlorobenzoate— CoA ligase (EC 6.2.1.-) (4-CBA— CoA ligase) [7], a Pseudomonas 

55 enzyme involved in the degradation of 4-CBA. - 1 ndoleacetate— lysine ligase (lAA-lysine synthetase) [8], an enzyme 
from Pseudomonas syringae that converts indoleacetate to lAA-lysine. - Bile add-CoA ligase (gene baiB) from Eubac- 
terium strain VP1 12708 [4]. This enzyme catalyzes the ATP-dependent formation of a variety of C-24 bile acid-CoA.- 
Crotonobetaine/camitine-CoA ligase (EC 6.3.2.-) from Escherichia coil (gene caiC). - L-(alpha-aminoadipyl)-L-cystei- 
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nyl-D-valine synthetase (ACV synthetase) from various fungi (gene acvA or pcbAB). This enzyme catalyzes the firet 
J«n in the biosynthesis of oenicillin and cephalosporin, the formation of ACV from the constituent amino acds. The 
amino acids seem to be activated by adenylation. It is a protein of around 3700 amino acids tnat contains inree .e,a.«a 
domains of about 1000 amino acids. - Gramicidin S synthetase I (gene grsA) from Bacillus brevis. This enzyme cata- 

5 lyzes the first step in the biosynthesis of the cyclic antibiotic gramicidin S, the ATP-dependent ^cemization of pheny- 
lalanine - Tyrocidine synthetase 1 (gene tycA) from Bacillus brevis. The reaction carried out by tycA is 'dentica to that 
catalyzed by grsA - Gramicidin S synthetase II (gene grsB) from Bacillus brevis. This enzyme is a multifunctional proteK^ 
that activates and polymerizes proline, valine, ornithine and leucine. GrsB consists of four related domains. - Entero- 
bactin synthetase componentsE(geneentE)andF(geneentF) from Escherichia COIL These twoen^^^^^^^^ 

»o in the ATP-dependent acUvation of respectively 2.3-dihydroxybenzoate and senne dunng enterobact.n (^"terochehn) 
biosynthesis. Cyclic peptide antibiotic surfactin synthase subunits 1, 2 and 3 from bacillus subtilis Subunits 1 and 2 
contains three related domains while subunit 3 only contains a single domain. - HC4oxin synthetase (gene HTS1 ) from 
Cochliobolus carbonum. This enzyme activates the four amino acids (Pro. L^a. D-Ala 2-amino^.lO-epoxi-^- 
oxodecanoic acid) that make up HC-toxin, a cyclic tetrapeptide. HTS1 consists of four related domains.There are also 

« some proteins, whose exact function is not yet known, but whichare. very probably, also AMP-binding enzymes. These 
proteins are: - ORA (octapeptide-repeat antigen), a Plasmodium falciparum protein whose function is not known bu 
which shows a high degree of similarity with the above proteins. - AngR. a Vibrio anguillarum protein. AngR is thought 
to be a transcriptional activator which modulates the anguibactin (an iron-binding siderophore) biosynthesis gene clus- 
ter operon But it is believed [9], that angR is not a DNA-binding protein, but rather an enzyme involved in the biosyn- 

20 thesis of anguibactin. This conclusion is based on three facts: the presence of the AMP-binding domain; the size of 
angR (1 048 residues), which is far bigger than any bacterial transcriptional protein; and the presence of a probable S- 
acyl thioesterase immediately downstream of angR. - A hypothetical protein in mmsB 3Yegton *" P^"f "'O"^^^!;^- 
uginosa. - Escherichia coli hypothetical protein ydiD. - Yeast hypothetical protein YBR041 w. - Yeast hypothetical protein 
YBR222C - Yeast hypothetical protein YER147c.AII these proteins contain a highly conserved region very nch in gly- 

25 cine, serine, and threonine whtoh is followed by a conserved lysine. A parallel can be '^'■^^"^'^^"f f" .^yPf^^^ 
domain and the G-x(4)-G-K-[ST] ATP-/GTP-binding 'P-loop' domain or the protein kinases G-x-G-x{2)-[SG]-x(10,Z0h 

S^M]""SSrrp'attem:[L1VMFY]-x(2HSTG]-[STAG]-G-[STl-p^^^^ 
es the residue that follows the Lys at the end of the pattern is a Gly. 

30 

[ 1] Toh H. Protein Seq. Data Anal. 4:111-117(1991). 

[2] Smith D.J.. Earl A.J.. Turner G. EMBO J. 9:2743-2750(1990). 

[ 31 Schroeder J. Nucleic Acids Res. 17:460-460(1989). 

[4] Mallonee D.H.. Adams J.L.. Hylemon PB. J. Bacteriol. 174:2065-2071(1992). 
35 [ 5] Turgay K., Krause M.. I^arahiel M.A. Mol. Microbiol. 6:529-546(1992). 

[ 6] Driscoll J.R., Taber H.W. J. Bacteriol. 174:5063-5071(1992). 

[7] Babbitt PC. Kenyon G.L.. Matin B.M.. Charest H.. Sylvestre M., Scholten J.D.. Chang K.-H.. Liang P-H.. 

Dunaway-Mariano D. Biochemistry 31:5594-5604(1992). 

[ 8] Fan-ell D.H.. Mikesell P. Actis L.A.. Crosa J.H. Gene 86:45-51(1990). 
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7. AP2 domain 

[01931 This 60 amino acid residue domain can bind to DNA [1]. This domain is plant specific. Members of this family 
are suggested to be related to pyridoxal phosphat^binding domains such as found in ^^^'^^l^'^l^^^llll 
45 are also described in Jofuku et al., copending U.S. Patent applications 08/700.152. 08/879.827. 08/912.272. 
09/026,039. 

[1] Ohme-takagi M, Shinshi H; Plant Cell 1995;7:173-182. 
[2] Weigel D; Plant Cell 1995;7:388-389. 
50 [3] Mushegian AR. Koonin EV; Genetics 1 996; 1 44:81 7-828. 

8. ARID 

(01 94] The ARID domain is an AT-Rich Interaction domain sharing structural homology to DNA replication and repair 
55 nucleases and polymerases. 

[1] Herrscher RF. Kaplan MH. Leisz DL. Das C. Scheuemiann R. Tucker PW; Genes Dev 1995;9:3067-3082. 
[21 Yuan YC. Whitson RH. Liu Q. Itekura K. Chen Y; Nat Struct Biol 1998;5:959-964. 
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9. (ATP synt) 

ATP synthase gamma subunit signature 

5 [0195] ATP synthase (proton-translocating ATPase) (EC 3-6.1.34) [1 ,2) is a componentof the cytoplasmic membrane 
of eubacteria, the inner membrane of mitochondria, and the thylakoid membrane of chloroplasts. The ATPase complex 
is composed of an oligomeric transmembrane sector, called CF(0), and a catalytic core, called coupling factor CF(1). 
The former acts as a proton channel; the latter is composed of five subunits, alpha, beta, gamrna, delta and epsilon. 
Subunit gamma is believed to be important in regulating ATPase activity and the flow of protons through the CF(0) 

to complex. The best conserved region of the gamma subunit [3] is its C4enminus which seems to be essential for as- 
sembly and catalysts. As a signature pattern to detect ATPase gamma subunits, a14 residue conserved segment where 
the last amino acid is found one to three residues from the C-terminal extremity was used. 

[0196] Consensus pattern: [IV]-T-x-E-x(2)-[DE]-x(3)-G-A-x-[SAKR]- Note: Pea chloroplast gamma and two Bacillus 
species gamma subunits are not detected by this motif. 

15 

[ 1] Futai M., Noumi T, Maeda M. Annu. Rev. Biochem. 58:111-136(1989). 
[2] Senior A.E. Physiol. Rev. 68:177-231(1988). 

[ 3] Miki J., Maeda M.. Mukohata Y., Futai M. FEBS Lett. 232:22 1-226(1 988). 
20 10. (ATP Synt A) 

Synthase a subunit signature 

[0197] ATP synthase (proton-translocating ATPase) (EC 3.6.1.34) [1.2] is a componentof the cytoplasmic membrane 
25 of eubacteria, the inner membrane of mitochondria. and the thylakoid membrane of chloroplasts. The ATPase complex 
is composed of an oligomeric transmembrane sector, called CF(0), which acts as a proton channel, and a catalytic 
core, termed coupling factor CF(1 ).The CF(0) a subunit. also called protein 6, is a key component of the proton channel; 
it may play a direct role in translocating protons across the membrane. It is a highly hydrophobic protein that has been 
predicted to contain 8 transmembrane regions [3].Sequence comparison of a subunits from ail available sources reveals 
30 very few conserved regions. The best conserved region is located in what is predicted to be the fifth transmembrane 
domain. This region contains three perfectly conserved residues: an arginine, a leucine and an asparagine. Mutagen- 
esis experiments of ATPase activity. This region was selected as a signature pattem. 

Consensus pattern: [STAGN]-x-[STAG]-[LIVMF]-R-L-x-[SAGV]-N-[LIVMT] [R is important for proton translocation] 

35 [ 1] Futai M.. Noumi T. Maeda M. Annu, Rev. Biochem. 58:111-136(1989). 

[2] Senior A.E. Physiol. Rev. 68:177-231(1988). 

1 3] Lewis M.L.. Chang J.A., Simoni R.D. J. Biol. Chem. 265:10541-10550(1990). 
[4] Cain B.D.. Simoni R.D. J. Biol. Chem. 264:3292-3300(1989). 

40 11. ATP synthase B 

[0198] Part of the CF(0) (base unit) of the ATP synthase. The base unit is thought to translocate protons through 
membrane (inner membrane in mitochondria, thylakoid membrane in plants, cytoplasmic membrane in bacteria). The 
B subunits are thought to interact with the stalk of the CF(1) subunits. 

45 

12. (ATP synt C) 

ATP synthase c subunit signature 

50 [01 99] ATP synthase (proton-translocating ATPase) [1 .2] is a component of the cytoplasmic membrane of eubacteria, 
the inner membrane of mitochondria, and the thylakoid membrane of chloroplasts. The ATPase complex is composed 
of an oligomeric transmembrane sector, called CF(0), which acts as a proton channel, and a catalytic core, termed 
coupling factor CF(1).The CF(0) c subunit (also called protein 9, proteolipid, or subunit III) [3,4]is a highly hydrophobic 
protein of about 8 Kd which has been implicated in the proton-conducting activity of ATPase. Structurally subunit c 

55 consist of two long terminal hydrophobic regions, which probably span the membrane, and a central hydrophilic region. 
N,N'-dicyclohexylcarbodiimide (DCCD) can bind covalently to subunit c and thereby abolish the ATPase activity. DCCD 
binds to a specific glutamate or aspartate residue which is located in the middle ofthe second hydrophobic region near 
the C-terminus of the protein. A signature pattem which includes the DCCD-binding residue was derived. 
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[0200] Consensus pattern: IGSTA]-R-lNQl-P-x(10HLIVMFYWl(2)-x(3)-[LIVMFYW]-x-IDEl [D or E binds DCCD] 

[ 1] Futai M., Noumi T., Maeda M. Annu. Rev. Biochem. 58:111-13Biia89). 
[ 2] Senior A.E. Physiol. Rev. 68:177-231(1988). ^HOMQQ•l^ 
5 [ 3] Ivaschenko A.T.. Karpenyuk T.A.. Ponomarenko S.V. Biokhimiia 5f J^O^fJ^^^f 

[4] Recipon H.. Perasso R.. Adoutte A.. QuetierF. J. Mol. Evol. 34:292-303(1992). 

13. (ATPsyntDE) 

10 ATP synthase, Delta/Epsilon chain 

r02011 Part of the ATP synthase CF(1). These subunits are part of the head unit of the ATP synthase. The subunits 
ire called delta and epsil^n in human and metozoan species but in bacterial species the delta (D) subunrt .s theequ.v- 
alent to the Oligomycin sensitive subunit (OSCP) in metozoans. 

14. (ATP synt ab) 

ATP synthase alpha and beta subunits signature 

ro2021 ATPsynthase(proton-translocatingATPase)[1.2]isacomponentofthecytoplasmi^cn.embraneofeubac^^^^^^^ 
Sin er^rnbrane of Lochondria.and the thylakoid membrane of chl^^^^^^ 

of an oligomeric transmembrane sector, called CF(0). and a catalytic core, called <^"Pf ^J^J^ .fji^'^^^^"^ 
acts as a oroton channel; the latter is composed of five subunits, alpha, beta, gamma, delta and ep^ilon. The sequenc^ 
of ?ubuni£^phrand be ta are related and both contain a nucleotide-binding site for ATP and ADP The beta cham 
SSc?c^r white the alpha chain is a regulatory subunit. Vacuolar ATPases [3] (V-ATPases are responsible for 
rcKng tvX Of intracilular compartments in eukaryotic cells. Uke F-ATPases, j '^^^^^^^^^^ 

ofatranLembraL andacatalytic sector. The sequenceof the largest subunit of the cj^^^^^^^^ 
to that ofF-ATPase beta subunit, while a 60 Kd subunit. from the same sector, is related to the F-ATPases alpha subunit 
^yZoZSciZ membrane-associated ATPases are composed of three subunits.The -J^^^^-^^ 
ATPases beta chain and the beta chain is related to F-ATPases alpha chain I41.A protein f^'Sl^'y 
£ta subunt is found [51 in some bacteria, apparatus involved in a specialized protein ^^'^^-^C^^^^^^^ 
without signal peptide cleavage. This protein is known as flil in Bacillus and Salmonella. Spa47 (nwoB) in Shigella 
Ceri. TfrSBe'I^Santhomonas campestris and yscN in Yersinia virulence ^^---^^'^^■'°;'f;^^^^^^^ 
a segment of ten amino-acid residues, containing two conserved sennes. as a signature pattern was select«l_ Tt^e 
35 SrSeTneslmstobeimportantforcata^sis-intheATPasealphacham 

[Sr^Consensus pattem: P-[SAP]-[LIV]-[DNH]-x(3)-S-x-S [The first S is a putative active site residue] 

[ 1] Futai M., Noumi T, Maeda IVI. Annu. Rev. Biochem. 58:111-136(1989). 
40 [ 2] Senior A.E. Physiol. Rev. 68:177-231(1988). 

'^^^Tt L^^^^ J. Denda K.. Yoshida M. Proc. Na«. Acad^Sci. ^ f A^f^^^^^^^^^^ 

[ 5] Dreyfus G.. Williams A.W.. Kawagishi I.. MacUab RM, J. Bactenol. 175.3131-3138(1993). 
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15. (ATP synt ab C) 

ATP syntliase ab C terminal. 



" [Jl'JiLr SretG^tr R. Walker JE; Structure at 2.8 A reso.utk.n of FI-ATPase from bovine heart mito- 
chondria." Nature 1994:370:621-628. 



55 



16. (A deaminase) 

Adenosine and AMP deaminase signature 

[0205] Adenosine deaminase catalyzes the hydrolytic deamination ofadenosine into inosine. AMP deaminase cat- 
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alyzes the hydrolytic deamination of AMP into IMP. It has been shown [1] that these two types of enzymes share three 
regions of sequence similarities; these regions are centered on residues which are proposed to play an important role 
in the catalytic mechanism of these two enzymes. One of these regions, containing two conserved aspartic acid residues 
that are potential active site residues was selected. 
5 Consensus pattern: [SA]-[LIVM]-tNGSJ-(STA]-D-D-P [The two D's are putative active site residues] 
[1] Chang Z., Nygaard P.. Chinautt A.C.. Kellems R.E. Biochemistry 30:2273-2280(1991). 

17. (Acetyltransf) 

10 Acetyltransferase (GNAT) family. 

[0206] This family contains proteins with N-acetyltransferase functions. 
[1] Neuwald AF, Landsman D; Trends Biochem Sci 1997;22:154-155. 

15 18. (Aconitase C) 

Aconitase family signature 

[0207] Aconitase (aconitate hydratase) (EC 4.2.1.3) [1] is the enzyme from the tricarboxylic acid cycle that catalyzes 
20 the reversible isomerizatton of citrate and isocitrate. Cis-aconitate is formed as an intermediary product during the 
course of the reaction. In eukaryotes two isozymes of aconitase are known to exist: one found in the mitochondrial 
matrix and the other found in the cytoplasm. Aconitase, in its active form, contains a 4Fe-4S iron-sulfur cluster; three 
cysteine residues have been shown to be ligands of the 4Fe-4S cluster.lt has been shown that the aconitase family 
also contains the followingproteins: - Iron-responsive element binding protein (IRE-BP). IRE-BP is a cytosolic protein 
25 that binds to iron-responsive elements (IREs). IREs are stem-loop structures found in the 5'UTR of ferritin, and delta 
aminolevulinic acid synthase mRNAs, and in the 3*UTR of transferrin receptor mRNA. IRE-BP also express aconitase 
activity. - 3-isopropylmalate dehydratase (EC 4.2.1.33) (isopropylmalate isomerase). the enzyme that catalyzes the 
second step in the biosynthesis of leucine. - Homoaconitase (EC 4.2. 1 .36) (homoaconitate hydratase), an enzyme that 
participates in the alpha-aminoadipate pathway of lysine biosynthesis and that converts cis-homoaconitate into ho- 
30 moisocitric acid. - Esherichia coli protein ybhJ.As a signature for proteins from the aconitase family, two conserved 
regions that contain the three cysteine ligands of the 4Fe-4Scluster were selected. 

Consensus pattem: [LIVM]-x(2)-[GSACIVM]-x-[LIV]-[GTIV]-[STP]-C-x(0.1)-T-N.tGSTANI]-x(4)-ILIVMA] [C binds the 
iron-sulfur center] 

Consensus pattern: G-x(2)-[LIVWPQ]-x(3)-[GAC]-C-[GSTAM]-[LIMPTA)-C-[LIMVl-[GA] [The two Cs bind the iron-sul- 
35 fur center] 

[ 1] Gruer M.J.. Artymiuk P.J.. Guest J.R. Trends Biochem. Sci. 22:3-6(1997). 

19. (Acyl-CoAdh) 

<o Acyl-CoA dehydrogenases signatures 

[0208] Acyl-CoA dehydrogenases [1,2.3] are enzymes that catalyze the alpha, beta-dehydrogenation of acyl-CoA 
esters and transfer electrons to ETF, the electron transfer protein. Acyl-CoA dehydrogenases are FAD flavoproteins. 
This family currently includes: - Five eukaryotic isozymes that catalyze the first step of the beta-oxidation cycles for 

45 fatty acids with various chain lengths. These are short (SCAD) (EC 1.3.99.2), medium (MCAD) (EC 1.3.99.3). long 
(LCAD) (EC 1.3.99.13), very-long (VLCAD) and short/branched (SBCAD) chain acyl-CoA dehydrogenases. These 
enzymes are located in the mitochondrion. They are all homotetrameric proteins of about 400 amino acid residues 
except VLCAD which is a dimer and which contains, in its mature form, about 600 residues. - Glutaryl-CoA dehydro- 
genase (EC 1.3.99.7) (GCDH), which is involved in the catabolism of lysine, hydroxylysine and tryptophan. - Isovaleryl- 

50 CoA dehydrogenase (EC 1.3.99.10 ) (IVD). involved in the catabolism of leucine. • Acyl-coA dehydrogenases acsA and 
mmgC from Bacillus subtilis. - Butyryl-CoA dehydrogenase (EC 1.3.99.2) from Clostridium acetobutylicum. - Es- 
cherichia coli protein caiA [4]. - Escherichia coli protein aidB. Two conserved regions were selected as signature pat- 
terns. The first is located in the center of these enzymes, the second in the C-temninal section. 
Consensus pattem: [GAC)-[LIVM]-[ST|-E-x(2)-[GSAN]-G-(ST|-D-x(2)-[GSA] 

55 Consensus pattem: [QDEl-x(2)-G-[GS]-x-G-[LIVMFY)-x(2)-[DEN]-x(4)-[KRl-x(3)-[DEN] 

[ 1] Tanaka K.. Ikeda. Matsubara Y., Hyman D.B. Enzyme 38:91-107(1987). 

[ 2] Matsubara Y.. Indo Y., Naito E.. Ozasa H.. Glassberg R., Vocktey J.. Ikeda Y., Kraus J.. Tanaka K. J. Biol. 
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Chem. 264:16321-16331(1989). .nnn>i/-,nQ>ix 

r 31 Aovama T.. Ueno I.. Kamijo T. Hashimoto T. J. Biol. Chem. 269:19088-19094(1994). _ 

[4i Eichler K.. Bourgis F., Buchet A.. Kleber H.-R. Mandrand-Berthelot M.-A. Moi. Microo.oi. lo:/ /o-/oov 

20. (Acyl transf) 

Acyl transferase domain 

Lcyl Carrie; protein transacylase at 1 .5-A resolution. Crystal structure of a fatty acid synthase component. J B.ol Chem 
1995;270:12961-12964. 

21. Acylphosphatase signatures 

[0210] Acylphosphatase (EC 3.6.1.7) [1 ,2] catalyzes the hydrolysis of various acylphosphate «^rt»oxy'-phosphate 
tonds such as cart>amyl phosp^ii^i:^uccinylphosphate. 1,3-diphosphoglycerate, etc. The physiological role of this 
^Jmeis IfyeTdear. A,^lphLphatase is a small protein of around 100 amino-acid residues. There are two known 
fsSTes One seems to be specific to muscular tissues, the other, called -organ-common type", .s found ,n many 
iSuissues.Xhile acylphosphatase have been so far only characterized in vertebrates.there are a number of 
baSenal and archebacteriaihypothetical proteins that are highly similar to that enzyme and that probatjiy possess the 
same aS.These proteins are: - Escherichia coli hypothetical protein yccX. - Bacillus subtil^ hypothetica protein 
- Schaeoglobus fulgidus hypothetical protein AF0818. Two conserved regions were selected as s^na^^ Pj- 
iems. The first is located in the N-terminal section, while the second is found in the central part ofthe protein sequence. 
Consensus pattern: [LIV]-x-G-x-V-Q-G-V-x-tFM]-R 
Consensus pattern: G-[FYW]-[AVC]-IKRQAMl-N-x(3)-G-x-V-x(5)-G 

[ 1] Stefani M.. Ramponi G. Life Chem. Rep. 12:271-301(1995). 

[ 2] Stefani M.. Taddei N., Ramponi G. Cell. Moi. Life Sci. 53:141-151(1997). 

22. (Adap comp sub) 

Clathrin adaptor complexes medium chain signatures. 

[0211] Clathrin coated vesicles (CCV) mediate intracellular membrane traffic such asreceptor mediated endocytosis^ 
nadditiontoclathrin.theCCV are composed ofanumber of other components in^^^ 

are knownas adaptor or clathrin assembly proteins (AP) complexes [1]. The adaptor complexes are "el-eved to irterac 
w th the cytoplasmic tails of membrane proteins, leading to their selection and concentra^on. ^"''"ate h«o^pe of 
Adaptor complexes are known: AP-1 which is associated with the Golgi complex and AP-2 which .s a^oc^ted wrth 
the Sasma membrane. Both AP-1 and AP-2 are heterotetramers that consist of two large chains - the adapting - 
(SamSTnd beta- in AP-1 ; alpha and beta in AP-2): a medium chain (AP47 in AP-1 ; AP50 inAP-2) and a small chain 
(Xp^STn AP-1- AP17 in AP-2) The medium chains of AP-1 and AP-2 are evolutionary related proteins o about 50 Kd 
Homologs of AP47 and AP50 have also been found in Caenoriiabditis elegans (genes unc-1 01 and ap50) [2] and yeast 
rgene aS/11 or YAP54) [SJ.Some more divergent, but cleariy evolutionary related proteins have a^o been found in 
yeast: APM2 and YBR2MC . Two conserved regions were selected as signature patterns, one located ,n the N-temiinal 
region, the other from the central sectron of these proteins. 

Consensus pattern: [IVT]-[GSP]-W-R-x(2.3)-[GAD]-x(2)-[HY]-x(2)-N-x- [LIVMAFY1(3)-D-[LIVM]-ILIVMT]-E 
Consensus pattern: [LIV]-x-F-l-P-P-x-G-x-ILlVMFY]-x-L-x(2)-Y 

[ I] Pearse B.M.. Robinson M.S. Annu. Rev. Cell Biol. 6:151-171(1990). 

[2] Lee J., Jongeward G.D., Sternberg PW. Genes Dev. 8:60-73(1994). , ow-ho™ 9n9- 

[3] Nakayama Y.. Goebl M.. O'Brine G.B.. Lemmon S.. Pingchang C.E.. Kirchhausen T Eur. J. Biochem. 202. 
569-574(1991). 
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23. (Adenylsucc synt) 
Adenylosuccinate synthetase signatures 

5 [0212] Adenylosuccinate synthetase (EC 6.3.4.4 ) [1] plays an important role in purinebiosynthesis, by catalyzing the 
GTP-dependent conversion of IMP and aspartic acid to AMP. Adenylosuccinate synthetase has been characterized 
from various sources ranging from Escherichia coll (gene purA) to vertebrate tissues. Invertebrates, two isozymes are 
present - one involved in purine biosynthesis and the other in the purine nucleotide cycle. Two conserved regions were 
selected as signature patterns. The first one is a perfectly conserved octapeptide located in the N4erminal section and 

10 which is involved in GTP-binding [2]. The second one includes a lysine residue known [2] to be essential for the enzyme's 
activity. 

Consensus pattern: Q-W-G-D-E-G-K-G 

Consensus pattern: G-l-[GR]-P-x-Y-x(2)-K-x(2)>R [K is the active site residue] 

15 1 1] Wiesmueller L., Wittbrodt J., Noegel A.A., Schleicher M. J. Biol. Chem. 266:2480-2485(1991). 

[2] Silva M.M., Poland B.W., Hoffman C.R., Fromm H.J., Honzatko R.B. J. Mol. Biol. 254:431-446(1995). 
[ 3] Bouyoub A.. Barbier G., Fortenre P.. Labedan B. 2.3.CO:2>"J. Mol. Biol. 261:144-154(1996) . 

24. (AdoHcyase) 

20 

S-adenosyl-L-homocysteine hydrolase signatures 

[021 3] S-adenosyl-L-homocysteine hydrolase (EC 3.3.1.1) (AdoHcyase) is an enzyme of the activated methyl cycle, 
responsible for the reversible hydratation of S-adenosyl-L-homocysteine into adenosine and homocysteine. AdoHc- 
25 yase is anubiquitous enzyme which binds and requires NAD+ as a cofactor. AdoHcyase is a highly conserved protein 
[1] of about 430 to 470 amino acids. Two highly conserved regions were selected as signature patterns. The first pattern 
is located in the N-terminal section; the second is derived from aglycine-rich region in the central part of AdoHcyase; 
a region thought to be involved in NAD-binding. 

Consensus pattern: [GSA]-[CS]-N-x-[FYLM]-S-[ST]-[QA]-[DEN]-x-[AV]-IATl-[AD]-[AC]-[LIVMCG] 
30 Consensus pattern: [GA]-[KS]-x(3)-[LIV]-x-G-[FY]-G-x-[VC]-G-[KRL]-G-x-[ASC] 

[ 1] Sganga M.W., Aksamit R.R., Cantoni G.L., Bauer C.E. Proc. Natl. Acad. Sci. U.S.A. 89:6328-6332(1992). 

25. AhpC/TSA family 

35 [0214] This family contains proteins related to alkyi hydroperoxide reductaseComment: (AhpC) and thiol specific 
antioxidant (TSA). 

[1] Chae HZ, Robison K, Poole LB, Church G, Storz G, Rhee SG, Proc Natl Acad Sci U S A 1994;91:7017-7021 

26. (Aldose epim) 

40 

[0215] Aldose 1-epimerase putative active site Aldose 1 -epimerase (EC 5. 1 .3.3) (mutarotase) is the enzyme respon- 
sible for the anomeric interconversion of D-glucose and other aldoses between their alpha- and beta-forms. The se- 
quence of mutarotase from two bacteria, Acinetobacter calcoaceticus and Streptococcus thermophtlus is available [1]. 
It has also been shown that, on the basis of extensive sequence similarities, a mutarotase domain seem to be present 
45 in the C-terminal half of the fungal GAL10 protein which encodes, in the N-terminal part, for UDP-glucose 4-epimerase. 
The best conserved region in the sequence of mutarotase is centered around a conserved histidine residue which may 
be involved in the catalytic mechanism. 
Consensus pattern: [NSl-x-T-N-H-x-Y-[FWl-N-[Lll 

[ 1] Poolman B.. Royer T.J., Mainzer S.E., Schmidt B.F. J. Bacteriol. 172:4037-4047(1990). 

50 

27. (AlkA DNA repair) 

Alkylbase DNA glycosidases aikA family signature 

55 [0216] Alkylbase DNA glycosidases [1] are DNA repair enzymes that hydrolyzes the deoxyribose N-glycosidic bond 
to excise various alkylated bases from a damaged DNA polymer. In Escherichia ooll there are two alkylbase DNA 
glycosidases: one (gene tag)which is constitutively expressed and which Is specific for the removal of 3-methyladenine 
(EC 3.2.2.20). and one (gene alkA) which is induced during adaptation to alkylation and which can remove a variety 
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of alkylation products (EC 3.2.2.21). Tag and alKA do not share any region of sequence siinilarity. In V,^^^^^^^''^^^" 
UlbaL DNA alvcosfdase (gene MAG1) 12.3]. which can remove 3-methy.adenine or 7-methyladen,ne and wh,ch ,s 
structurally related to alkA. MAG and alkA are both proteins of about 300 amino aca res.aues wnue u,. v.^ -n. 
Lminal ends appear to be unrelated, there is a central region of about 130 residues which .s well conserved. A portion 
of this region has been selected as a signature pattern . 

Consensus pattern: G-l-G-x-W-[STl-[A\0-x-[LIVMI^(2)-x-[UVMl-x(8)-[Mn-x(2)^ 

[11 Lindahl T., Sedgwick B. Annu. Rev. Biochem. 57:133-157(1988). 

[ 2] Berdal K.G., Bjoras M.. Bjelland S.. Seeberg E.C. EMBO J. 9:4563^568(1990). 

[ 3] Chen J., Derfler B., Samson L. EMBO J. 9:4569-4575(1990). 

28. Ammonium transporters signature 

[0217] A number of proteins involved in the transport of ammonium ions across amembrane as well as some yet 
uncharacterized proteins have been shown [1.2] to be evolutionary related. These proteins are: - Yeast a-nrnonium 
transDorters MEP1 MEP2 and MEP3. - Arabidopsis thaliana high affinity ammonium transporter (gene AMT1). - co- 
?nT,;:num grutamicum ammonium and methylammonium transport system, ^^^^^'^^^^^^^'^^l^^^^ 
Sansporter amtB. - Bacillus subtilis nrgA. - Mycobacterium tuberculosis hypothetical protein MtCY338.09c - Syne^ 
c^ocystis strain PCC 6803 hypothetical proteins sl.0108. sll0537 and sll1017 - ^ethanoc^^^^^^^^ 
protJns MJ0058 and MJ1343. - Caenorhabditis elegans hypothetical proteins C05E11.4. F49E11^ and ^195^3^ As 
expected by their transport function, these proteins are highly hydrophobic and seem to a)ntain from 10 ^ ^^trans- 
membrane domains. The best conserved region seems to be located in the fifth (or sixth) transmembrane region and 

Con^en^iVaS^ro^^^^ 

25 R 

[1lNinnemannO..JanniauxJ.-C..FrommerW.B. EMBO J. 13:3464-3471(1994). ,7i..;3Q8 5403 

[ 2] Siewe R.M., Weil B.. Burkovski A.. Eikmanns B.J., Eikmanns M.. Kraemer R. J. Biol. Chem. 271.5398-5403 

(1996). 

30 [ 3] Saier M.H. Jr. Adv. Microbiol. Physiol. 40:81-136(1998). 

29. (Arch_histone) 
CBF/NF-Y subunits signatures 

[0218] Diverse DNA binding proteins are known to bind the CCAAT box. a common cis-acting element fo""^ in the 
promoter and enhancer regions of a large number of genes in eukaryotes. Amongst these proteins is kn«wn as 
the CCAAT-binding factor (CBF) or NF-Y [1]. CBF is a heteromeric transcription factor that consists o^.t^'O 
components both needed for DNA-binding. The HAP protein complex of yeast binds to the upstream activation srte of 
cytochrome C iso-1 gene (CYC1) as well as other genes involved in mitochondrial electron t^f ^PO'' 
S ex^ession. It also recognizes the sequence CCAAT and is structurally and evolutionary related to CBR The first 
subun^ of ?BF. known as CBF-A or NF-YB in vertebrates, HAP3 in budding yeast and as php3 in fes-on Veaf is a 
protein of 116 to 210 amino-acid residues which contains a highly conserved central domain of abou ^Oresidues^ This 
Somainseems to be involvedinDNA-binding:asignature pattern had been developed frorn Its central part The se^^^ 

sLnit of CBF.known as CBF-BorNF-YAin vertebrates. HAP2 in budding yeast and php2infiss,ony^^^^^^ 
of 265 to 350 amino-acid residues which contains a highly «,nserved region of about 60 residues. This region, ca led 
the -essential core' [2], seems to consist of two subdomains: an N-terminal subunit-associaton domain and a C-temiinal 
DNA rlSgnit^on dirnain. A signature pattern has been developed from a section of the subunit-assocation domain. 
Consensus pattem: C-V-S-E-x-l-S-F-ILIVM]-T-[SG]-E-A-[SC]-[DE]-[KRQ]-C- 
50 Consensus pattem: Y-V-N-A-K-Q-Y-x-R-l-L-K-R-R-x-A-R-A-K-L-E- 

[ 1] Li X.-Y. Mantovani R.. Hoofl van Huijsduijnen R.. Andre I.. Benoist C. Mathis D. Nucleic Acids Res. 20: 
1087-1091(1992). 

[2] Olesen J.T. Fikes J.D.. Guarente L. Mol. Cell. Bid. 11:611-619(1991). 

55 

30. Argininosuccinate synthase signatures 

[0219] Argininosuccinate synthase (EC 6.3.4.5) (AS) is a urea cyde enzyme that catalyzes the penultimate step in 
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arginine biosynthesis: the ATP>dependent ligation of citruiline to aspartate to form argininosuccinate. AMP andpyro- 
phosphate [1.2].ln humans, a defect in the AS gene causes citrullinemia, a genetic disease characterized by severe 
vomiting spells and mental retardation. AS is a homotetrameric enzyme of chains of about 400 amino-acid residues. 
Anarginine seems to be important for the enzyme's catalytic mechanism. The sequences of AS from various prokary- 
5 otes, a rchae bacteria and eukaryotes show significant similarity. Two signature patterns have been selected for AS. 
The first is a highly conserved stretch of nine residues located in the N-terminal extremity of these enzymes, the second 
is derived from a conserved region which contains one of the conserved arginine residues. 
Consensus pattern: [AS]-[FY]-S-G-G-[LV]-D-T-[ST1- 
Consensus pattern: G-x-T-x-K-G-N-D-x(2)-R-F' 

10 

1 1] van Vliet F., Crabeel M., Boyen A., Tricot C, Stalon V.. Falmagne P., Nakamura Y, Baumberg S., Gtansdorff 
N. Gene 95:99-104(1990). 

[ 2] Moms C.J.. Reeve J.N. J. Bacterid. 170:3125-3130(1988). 

15 31. Armadillo/beta-catenin-like repeats 

[0220] Approx. 40 amino acid repeat. Tandem repeats form super-helix of helices that is proposed to mediate inter- 
action of beta-catenin with its ligands. CAUTION: This family does not contain all known armadillo repeats. 

20 [1] Huber AH. Nelson WJ. Weis Wl, Cell 1 997;90:87 1-882. 

[2] Gumbiner BM, Curr Opin Cell Biol 1995;7:634-640. 

[3] Cavallo R. Rubenstein D, Peifer M, Curr Opin Genet Dev 1997;7:459-466. 
[4] Su LK, Vogelstein B, Kinzler KW. Science 1993;262:1734-1737. 
[5] Masiarz FR, Munemitsu S, Polakis P Science 1993;262:1731-1734 
25 [6] Peifer M. Wieschaus E, Cell 1990;63:1167-1176. 

32. (Asn Synthase) 
Asparagine synthase 

30 

[0221] This family is always found associated with GATase_2. Members of this family catalyse the conversion of 
aspartate to asparagine. 

33. Asparaginase_2 

35 

Asparaginase 12 members 

34. (Aspartyl tRNA N) 

40 Aminoacyl-transfer RNA synthetases class-ll signatures 

[0222] Aminoacyl-tRNA synthetases (EC 6.1 .1 .-) [1] are a group of enzymes which activate amino acids and transfer 
them to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are at least 
twenty different types of aminoacyl-tRNA synthetases, one for each different amino acid. In eukaryotes there are gen- 

45 erally two aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a mitochondrial form 
While all these enzymes have a common function, they are widely diverse in terms of subunit size and of quaternary 
structure. The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine. lysine, phenylalanine, 
proline, serine, and threonine are referred to as class-ll synthetases [2 to 6] and probably have a common folding 
pattern in their catalytic domain for the binding of ATP and amino acid which is different to the Rossmann fold observed 

50 for the class I synthetases [7]. Class-ll tRNA synthetases do not share a high degree of similarity, however at least 
three conserved regions are present [2,5,8]. Signature patterns have been derived from two of these regions. 
Consensus pattern: (FYH]-R-x-[DE]-x(4,12)-[RH]-x(3)-F-x(3)-[DE] 

Consensus pattern: [GSTALVF]-{DENQHRKPHGSTA]-[LIVMF]-[DE]-R-[UVMF]-x-[LIVMSTAG]-[LIVMFY] 

55 [1] Schimmel P. Annu. Rev. Biochem. 56:125-158(1987). 

[ 2] Delarue M.. Moras D. BioEssays 15:675-687(1993). 
[ 3] Schimmel P. Trends Biochem. Sci. 16:1-3(1991). 

[4] Nagel G.M.. Doolittle R.F. Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). 
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1 5] Cusack S., Haertlein M.. Leberman R. Nudeic Acids Res. 1 9:3489-3498(1991 ). 
rfilCusackS.Biochimie 75:1077-1081(1993). _ 
[ 7] Cusack S., Berthet-Colominas C, Haertlein M., Nassar N., Leberman ^' ---^"-^ 

[ 8] Leveque 1^^.. Plateau P.. Dessen P.. Blanquet S. Nucleic Acids Res. 18:305-312(1990). 

-x^ /Ar^^'an^ Putative GTP-ase activating protein for Art. Putative zinc fingers with GTPase activating proteins (GAPs) 
fowirthe^rS?^^^^^^^^ GAP of ARD1 stimulates GTPase hydrdysis for ARD1 but not ARFs. Numberof 

members: 34 

[0223] 

SLre^97?9^2T^^^^^^^^^ 

Sth^DP ribosvlation -factor GTPase-activating protein and two pieckstrin homology domains. Tanaka K. Iniajoh- 

tein 1 (ARD1). Vitaie N. Moss J. Vaughan M; J Biol Chem 1998:273:2553-2560. 

36. Apolipoprotein. Apolipoprotein A1/A4/E family. This family ^v^-^^^^^e^^^^^^^ 

P06727 Apolipoprotein A-IV. Swiss:P02649 Apolipoprotein E. These proteins contain several 22 residue repeats which 
form a pair of alpha helices. Number of members: 42 

[0224] [1]Med.ine: 91289138. Three-dimensional structure of the receptor-binding do^^^^^^^ apolipo- 
protein E. Wilson C. warden MR. Weisgraber KH, Mahley RW. Agard DA; Science 1991.252.1817-1822. 

37. Amino acid pemieases signature 

[LIVMFYWTl-x-[LIVMSTl-x(3)-[LIVMCTA]-[GA]-E-x(5)-[PSAL]- 

[ 11 Weber E.. Chevalier M.R.. Jund R. J. Mol. Evoi. 27:341-350(1988). 
[ 2] Vandenbol M.. Jauniaux J.-C.. Grenson M. Gene 83^153-159(1989). 

i 31 Reizer J.. FInley K.. Kakuda D.. McLeod C.L.. Reizer A.. Saier M.H. Jr. Protein Sci. 2.20-30(1993). 
38. aakinase (1) Glutamate 5-kinase signature 

[0226] Glutamate 5-kinase (EC 2.7.2.11) (gamma-glutamyl kinase) (GK) is the enzyme that catalyzes the first step 
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in the biosynthesis of proline from gtutamate, the ATP-dependent phosphorylation of L-glutamate into L-glutamate 
5-phosphate. In eubacteria (gene proB) and yeast [1] (gene PR01), GK is a monofu notional protein, while in plants 
and mammals, it is a bifunctional enzyme (P5CS) [2]that consists of two domains: a N-terminal GK domain and a C- 
terminal gamma-glutamyt phosphate reductase domain (EC 1.2.1.41) (see <PDOC00940>).As a signature pattern, a 
5 highly conserved glycine-and alanine-rich region located in the central section of these enzymes has been selected. 
Yeast hypothetical protein YHR033w is highly similar to GK. 

Consensus pattern: [GSTN]-x(2)-G-x-G-[GC]-[IMl-x-[STA]-K-(LIVM]-x-(SA)-[TCA]-x(2)-[GALV]-x(3).G- 

( 1] Li W., Brandriss M.C. J. Bacteriol. 174:4148-4156(1992). 
10 1 2] Hu C.-A.A.. Delauney A.J., Vemna D.P.S. Proc. Natl. Acad, Sci. U.S.A. 89:9354-9358(1992). 

aakinase (2) Aspartoktnase signature 

[0227] Aspartokinase (EC 2.7.2.4 ) (AK) [1] catalyzes the phosphorylation of aspartate. The product of this reaction 
15 can then be used in the biosynthesis of lysine or in the pathway leading to homoserine, which participates in the 
biosynthesis of threonine, isoleucine and methionine. In Escherichia coli, there are three different isozymes which differ 
in their sensitivity to repression and inhibition by Lys, Met and Thr. AK1 (gene thrA) and AK2 (gene metL) are bifunctional 
enzymes which both consist of an N- terminal AK domain and a C-terminal homoserine dehydrogenase domain. AK1 
is involved in threonine biosynthesis and AK2, in that of methionine. The third isozyme, AK3 (gene lysC), is monofunc- 
20 tional and involved in lysine synthesis. In yeast, there is a single isozyme of AK (gene HOM3). As a signature pattern 
for AK, a conserved region located in the N-tenminal extremity has been selected. 
Consensus pattern: [LIVM]-x-K-[FY]-G-G-[ST]-[SC]-ILIVfVI]- 
[ 1] Rafalski J.A.. Faico S.C. J. Biol. Chem. 263:2146-2151(1988). 

25 aakinase (3) Gamma-glutamyl phosphate reductase signature 

[0228] Gamma-glutamyl phosphate reductase (EC 1.2.1.41) (GPR) is the enzyme that catalyzes the second step in 
the biosynthesis of proline from glutamate, the NADP-dependent reduction of L-glutamate 5-phosphate into L-gluta- 
mate 5-semialdehyde and phosphate. In eubacteria (gene proA) and yeast [1] (gene PR02), GPR is a monofunctional 
30 protein, while in plants and mammals, it is a bifunctional enzyme (P5CS) [2]that consists of two domains: a N-terminal 
glutamate 5-kinase domain(EC 2.7.2.11 ) (see < PDOC00701 >) and a C-terminal GPR domain. As a signature pattern, 
a conserved region that contains two histidine residues has been selected. This region is located in the last third of GPR. 
Consensus pattern: V-x(5)-A-[UV]-x-H-l-x(2)-[HY]-[GS]-[ST|-x-H-[ST]-[DE]-x- I- 

35 [ 1] Pearson B.M., Hernando Y., Payne J., Wolf S.S., Kalogeropoulos A., Schweizer M. Yeast 12:1021-1031(1996). 

[2] Hu C.-A.A., Delauney A.J.. Venna D.P.S. Proc. Natl. Acad. Sci. U.S.A. 89:9354*9358(1992). 

39. (abhydrolase) alpha/beta hydrolase fold. This catalytic domain is found in a very wide range of enzymes. 

40 [0229] [1] OIlis DL, Cheah E, Cygler M, Dijkstra B, Frolow F, Franken SM, Hare! M. Remington SJ. Silman I. Schrag 
J, Sussman JL, Verschueren KHG, Goldman A. Protein Eng 1992:5:197-211. 

40. (Acid phosphat) Histidine acid phosphatases signatures 

45 [0230] Acid phosphatases (EC 3.1.3.2) are a heterogeneous group of proteins that hydrolyze phosphate esters, 
optimally at low pH. It has been shown [1] that a number of acid phosphatases, from both prokaryotes and eukaryotes, 
share two regions of sequence similarity, each centered around a conserved histidine residue. These two histidines 
seem to be involved in the enzymes' catalytic mechanism [2,3]. The first histidine is located in the N-terminal section 
and forms a phosphohistidine intermediate while the second is located in the C- terminal section and possibly acts as 

50 proton donor. Enzymes belonging to this family are called 'histidine acid phosphatases* and are listed below: 

Escherichia coli pH 2.5 acid phosphatase (gene appA). 
Escherichia coli glucose-1 -phosphatase (EC 3.1.3.10) (gene agp). 
Yeast constitutive and repressible acid phosphatases (genes PH03 and PH05). 
55 . Fission yeast acid phosphatase (gene phol). 

Aspergillus phytases A and B (EC 3.1.3.8) (gene phyA and phyB). 
Mammalian lysosomal acid phosphatase. 
Mammalian prostatic acid phosphatase. 
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- Caenorhabditis elegans hypothetical proteins B0361.7, C05C10.1, C05C10.4 and F26C11.1. 

[0231] Consensus pattem[LIVIVI]-x(2)-[LiV1VIAl-x{2HLiVM]-x-R-H-lC3NJ-x-K-x-lPA3j [H is a.« pi.usphohistidins rasi- 

?onsensuspattemlLIVMI^-x-[UVMFAG]-x(2)-[STAGll-H-D-ISTANQ]-x-[LIVIva-x(2V 

tlve site residue] Sequences known to belong to this class detected by the pattemALL. except for rat prostatic acid 
phosphatase which seems to have Tyr instead of the active site His 

1 11 van Etten R.L.. Davidson R.. Stevis P.E.. MacArthur H.. Moore D.L. J. Biol. C/^!,"^- 266:2313-2319(1^^^^^ 
[q Ostanin K.. Hamis E.H.. Stevis P.E.. Kuciel R.. Zhou M.-M., van Etten R.L. J. Biol. Chem. 267.22830-22836 

(1992). 

[ 3] Schneider G., Lindqvist Y, Vihko P. EMBO J. 12:2609-2615(1993). 
41. Aconitase family signatures 

[0232] Aconitase (aconitate hydratase) (EC 4.2.1.3 ) [1] is the enzyme from the tricarboxylic acid cycle that catalyzes 
the reversible isomerization of citrate and isocitrate. Cis-aconitate is formed as an intermediary Product dunng the 
course of the reaction. In eukaryotes two isozymes of aconitase are known to exist: one found ,n tine '"'tochondnai 
matrix and the other found in the cytoplasm. Aconitase. in its active fomi. contains a 4Fe^S iron-sulfur cluster; three 
^s™esidues have been shown to be ligands of the AFeAS cluster. It has been shown that the aconitase family 
STo contains the following proteins: - Iron-responsive element binding P™*-" ('R^,-«P)J''E:f;j^^^^^^^ 
that binds to iron-responsive elements (IREs). IREs are stem-loop structures in ttie 5UTR of fe"^*'"' ^"^ J^^ 
aminolevulinic acid synthase mRNAs. and in the 3'UTR of transferrin receptor mRNA. IRE-BP also express aconitase 
activity - 3-isopropylmalate dehydratase (EC 4.2.1.33) (isopropylmalate isomerase). the enzyme that catalyzes the 
second step in the biosynthesis of leucine. - Homoaconitase (EC4.2.1.36) (homoaconitate hydratase), an enzyme that 
participates in the alpha-aminoadipate pathway of lysine biosynthesis and that converts cis-homoaconitate into ho- 

rsri'p?«e;n:tC^^^^^^^^ '''"^^ 
Conse^sIs^'Sttem: G-x(2)-[LIVWPQ]-x(3)-[GAC]-C-[GSTAM]-ILIMPTA]-C-[LIMV]-[GA] [The two C's bind the iron-sul- 
fur center]- 

[ 1] Gruer M.J., Artymiuk RJ., Guest J.R. Trends Biochem. Sci. 22:3-6(1997). 
42. Actins signatures 

[0233] Actins [1 to 4] are highly conserved contractile proteins that are present in all eukaryotic cells. In vertebrates 
there are three groups of actin isoforms: alpha, beta and gamma. The alpha actins are found in muscle tissues and 
are a major constituent of the contractile apparatus. The beta and gamma actins coexists in most cell types as com- 
ponents of the cytoskeleton and as mediators of internal cell motility. In plants [5] there are many isoforms which are 
probably involved in a variety of functions such as cytoplasmic streaming, cell shape determination, tip gro«rth. gravi- 
perception. cell wall deposition, etc. Actin exists either in a monomeric form (G-actin) or in a polymenzed form (F-actin). 
Each actin monomer can bind a molecule of ATP; when polymerization occurs, the ATP is hydrolyzed. Actin .s a protein 
of from 374 to 379 amino acid residues. The structure of actin has been highly conserved in the course of evo ution. 
Recently some divergent actin-IIke proteins have been identified in several species. These proteins are: - Cenfrartm 
(actin-RPV) from mammals, fungi (yeast ACTS. Neurospora crassa ro^) and Pneumocystis canni, (actiri-ll). Centractin 
seems to be a component of a multi-subunit centrosornal complex involved in microtubule based vesicle "^ot'lrty. This 
subfamily is also known as ARP1. - ARP2 subfamily which includes chicken ACTL yeast ACT2^ ^'^^Pl'-'^J^^^^ 
elegans actC. - ARP3 subfamily which includes actin 2 from mammals. Drosophila 66B, yeast ACT4 and fission yeast 
act2 - ARP4 subfamily which includes yeast ACT3 and Drosophila 1 3E. Three signature patterns have been developed. 
The first two are specific to actins and span positions 54 to 64 and 357 to 365. The last signature picks up both actins 
and the actin-like proteins and con-esponds to positions 106 to 118 in actins. 
Consensus pattern: [FY]-[LIV]-G-[DE]-E-A-Q-x-[RKQ](2)-G- 
Consensus pattern: W-[IV]-[STA]-[RK]-x-[DE]-Y-[DNE]-PE]- 

Consensus pattern: [LM]-[LIVM]-T-E-[GAPQhx-(LIVMFYWHQl-N-pSTAQ]-x(2)-N-[KR]- 

[ 1] Sheteriine P., Clayton J.. Span-ow J.C. (In) Actins, 3rd Edition, Academic Press Ltd. London. (1996). 
[ 2] Pollard T.D.. Cooper J.A. Annu. Rev. Biochem. 55:987-1036(1986). 
1 3] Pollard T.D. Curr. Opin. Cell Btol. 1:33-40(1990). 
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(4] Rubenstein RA. BioEssays 12:309-315(1990). 

[ 5] Meagher McLean B.G. Cell Motil. Cytoskeleton 16:164-166(1990). 
43. Adenylate kinase signature 

5 

[0234] Adenylate kinase (EC 2.7.4.3 ) (AK) [1] Is a small monomeric enzyme that catalyzes the reversible transfer of 
MgATP to AMP (MgATP + AMP = MgADP + ADP).ln mammals there are three different isozymes: - AK1 (or myokinase), 
which is cytosollc. - AK2. which is located in the outer compartment of mitochondria. - AK3 (or GTPAMP phospho- 
transferase), which is located in the mitochondrial matrix and which uses MgGTP instead of MgATP.The sequence of 

10 AK has also been obtained from different bacterial species and from plants and fungi. Two other enzymes have been 
found to be evolutionary related to AK. These are: - Yeast uridylate kinase (EC 2.7.4.-) (UK) (gene URA6) [2] which 
catalyzes the transfer of a phosphate group from ATP to UMP to form UDP and ADP. - Slime mold UMP-CMP kinase 
(EC 2.7.4.14 ) [3] which catalyzes the transfer of a phosphate group from ATP to either CMP or UMP to form CDP or 
UDP and ADP. Several regions of AK family enzymes are well conserved, including the ATP-binding domains. The 

15 most conserved of all regions have been selected as a signature for this type of enzyme. This region includes an 
aspartic acid residue that is part of the catalytic cleft of the enzyme and that is involved in a salt bridge. It also includes 
an arginlne residue whose modification leads to inactivation of the enzyme 
Consensus pattem: [LIVMFYW](3)-D-G-[FYI]-P-R-x(3)-[NO]- 

20 1 1] Schuiz G.E. Cold Spring Harbor Symp. Quant. Biol. 52:429-439(1987). 

[ 2] Liljelund P., Sanni A., Friesen J,D., Lacroute F. Biochem. Biophys. Res. Commun. 165:464-473(1989). 
[ 3] Wiesmueller L., Noegel A.A., Barzu C, Gerisch G., Schleicher M. J. Biol. Chem. 265:6339-6345(1990). 
[4] Kath T.H.. Schmid R., Schaefer G. Arch. Biochem. Biophys. 307:405-410(1993). 

25 [0235] 44. (adh.short) Short-chain dehydrogenases/reductases family signature. 

[0236] The short-chain dehydrogenases/reductases family (SDR) [1 ] is a very large family of enzymes, most of which 
are known to be NAD- or NADP-dependent oxidoreductases. As the first member of this family to be characterized 
was Drosophila alcohol dehydrogenase, this family used to be called [2,3,4]'insect-type'. or *short-chain' alcohol dehy- 
drogenases. Most member of this family are proteins of about 250 to 300 amino acid residues. The proteins currently 

30 known to belong to this family are listed below. - Alcohol dehydrogenase (EC 1.1.1.1 ) from insects such as Drosophila. 

- Acetoin dehydrogenase (EC 1.1.1.5 ) from Klebsiella terrigena (gene budC). - D-beta-hydroxybutyrate dehydrogenase 
(BDH) (EC 1.1.1.30) from mammals. - Acetoacetyl-CoA reductase (EC 1.1.1.36) from various bacterial species (gene 
phbB or phaB). - Glucose 1 -dehydrogenase (EC 1.1.1.47 ) from Bacillus. - 3-beta-hydroxysteroid dehydrogenase (EC 
1.1.1.51) from Comomonas testosteroni. - 20-beta-hydroxysteroid dehydrogenase (EC 1.1.1.53 ) from Streptomyces 

35 hydrogenans. - Ribitol dehydrogenase (EC 1.1.1.56 ) (RDH) from Klebsiella aerogenes. - Estradiol 17-beta-dehydro- 
genase (EC 1.1.1.62 ) from human. - Gluconate 5-dehydrogenase (EC 1.1.1.69) from Gluconobacter oxydans (gene 
gno). - 3-oxoacyl-[acyl-carrier protein] reductase (EC 1.1.1.100 ) from Escherichia coli (gene fabG) and from plants. - 
Retinol dehydrogenase (EC 1.1.1.105 ) from mammals. - 2-deoxy-d-gluconate 3-dehydrogenase (EC 1.1.1.125 ) from 
Escherichia coli and Erwinia chrysanthemi (gene kduD). - Sorbitol-6-phosphate 2-dehydrogenase (EC 1.1.1.140) from 

40 Escherichia coli (gene gutD) and from Klebsiella pneumoniae (gene sorD). - 15-hydroxyprostaglandin dehydrogenase 
(NAD+) (EC 1.1.1.141) from human. - Corticosteroid 1 1 -beta-dehydrogenase (EC 1.1.1.146) (11 -DH) from mammals. 

- 7-alpha-hydroxysteroid dehydrogenase (EC 1.1.1.159) from Escherichia coli (gene hdhA). Eubacterium strain VPI 
12708 (gene baiA) and from Clostridium sordellii. - NADPH-dependent carbonyl reductase (EC 1.1.1.184 ) from mam- 
mals. - Tropinone reductase-l (EC 1.1.1.206) and -II (EC 1.1.1.236 ) from plants. - N-acylmannosamine 1-dehydroge- 

45 nase (EC 1.1.1.233 ) from Flavobacterium strain 141-8. - D-arabinitol 2-dehydrogenase (ribulose forming) (EC 
1.1.1.250 ) from fungi. - Tetrahydroxynaphthalene reductase (EC 1.1.1.252 ) from Magnaporthe grisea. - Reridine re- 
ductase 1 (EC 1.1.1.253) (gene PTR1) from Leishmania. - 2.5-dichloro-2,5-cyclohexadiene-1,4-diol dehydrogenase 
(EC 1.1.-.-) from Pseudomonas paucimobilis. - Cis-1,2-dihydroxy-3,4-cyciohexadiene-1-carboxylate dehydrogenase 
(EC 1.3.1. -) from Acinetobacter calcoaceticus (gene benD) and Pseudomonas putida (gene xylL). - Biphenyl-2,3-di- 

50 hydro-2,3-diol dehydrogenase (EC 1 .3.1 .-) (gene bphB) from various Pseudomonaceae. - Cis-toluene dihydrodiol de- 
hydrogenase (EC 1 .3.1.-) from Pseudomonas putida (gene todD). - Cis-benzene glycol dehydrogenase (EC 1.3.1.19 ) 
from Pseudomonas putida (gene bnzE). - 2,3-dihydro-2,3-dihydroxybenzoate dehydrogenase (EC 1.3.1.28) from Es- 
cherichia coli (gene entA) and Bacillus subtilis (gene dhbA). - Dihydropteridine reductase (EC 1.6.99.7 ) (HDHPR) from 
mammals. - Lignin degradation enzyme ligD from Pseudomonas paucimobilis. - Agropine synthesis reductase from 

55 Agrobacterium plasmids (gene masi). - Versicolorin reductase from Aspergillus parasiticus (gene VER1). - Putative 
keto-acyl reductases from Streptomyces polyketide biosynthesis operons. - A trifunctional hydratase-dehydrogenase- 
epimerase from the peroxisomal beta-oxidation system of Candida tropicalis. This protein contains two tandemly re- 
peated *shor1-chain dehydrogenase-type' domain in its N-terminal extremity. - Nodulation protein nodG firom species 
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of Azospirillum and Rhizobium which is probably involved in the modification of the nodulation Nod factor fatty acyl 
chain - Nitrogen fixation protein fixR from Bradyrhizobium japonicum. - Bacillus subtilis protein dItE which is involved 
in the biosynthesis of D- alanyl-lipoteichoic acid. - Human follicular vanant transiocation proie.n 1 vrv , ,,. - .v.oiise 
adipocyte protein p27. - Mouse protein Ke 6. - Maize sex determination protein TASSELSEED 2. - Sarcophaga pere- 
grina 25 Kd development specific protein. - Drosophila fat body protein P6. - A Listeria monocytogenes hypothetical 
protein encoded In the internalins gene region. - Escherichia coli hypothetical protein yciK. - Escherichia coli hypothet- 
ical protein ydfG. - Escherichia coli hypothetical protein yjgi. - Escherichia coli hypothetical protein yjgU - Eschenchia 
coli hypothetical protein yohF. - Bacillus subtilis hypothetical protein yoxD. - Bacillus subtilis hypothetical protein jwTO. 

- Bacillus subtilis hypothetical protein ywfH. - Yeast hypothetical protein Y1L124W. - Yeast hypothetical protein YlR035c. 

- Yeast hypothetical protein YIR036C. - Yeast hypothetical protein YKL055C. - Fission yeast hypothetical protein 
SDAC23D3 11 One of the best conserved regions which includes two perfectly conserved residues, a tyrosine and a 
lysine has been selected as a signature pattern for this family of proteins. The tyrosine residue partiapates in the 

Snlen'sIT'^patem: [LIVSPADNK]-x(12)-Y-[PSTAGNCV]-[STAGNQCIVM]-[STAGC]-K- {PCHSAGFYRHLIVM- 
STAGD]-x{2)-[LIVMFYWl-x(3)- [L1VMFYWGAPTHQ1-[GSACQRHM] [Y is an active site residue] - 

1 1] Joernvall H.. Persson B., Kroolc M.. Atrian S.. Gonzalez-Duarte R.. Jeffery J.. Ghosh D. Biochemistry 34: 
6003-6013(1995). 

[ 2] Villarroya A., Juan E., Egestad B., Joernvall H. Eur. J. Biochem. 180:191-197(1989). 

[ 3] Persson B.. Krook M.. Joernvall H. Eur. J. Biochem. 200:537-543(1991) ,orMiaoo\ 
[41 Neidle E.L., Hartnett C. Omston N.L.. Bairoch A.. Rekik M.. Harayama S. Eur. J. Biochem. 204:113-120(1992). 

[02371 45 (adh_short_C2) Short-chain dehydrogenases/reductases family signature 

The short-chain dehydrogenases/reductases family (SDR) [1] is a very large family of enzymes, most of which are 
known to be NAD- or NADP-dependent oxidoreductases. As the first member of this family to be charactenzed was 
Drosophila alcohol dehydrogenase, this family used to be called [2,3,4]'insect-type-. or -short-chain' alcohol dehydro- 
genases. Most member of this family are proteins of about 250 to 300 amino acid residues. The proteins currently 
known to belong to this family are listed below. - Alcohol dehydrogenase (EC 1.1.1.1) from insects such as Drosophila. 

- Acetoin dehydrogenase (EC 1.1.1.5) from Klebsiella terrigena (gene budC). - D-beta-hydroxybutyrate dehydrogenase 
(BDH) (EC 1 1 . 1 .30) from waii^i^. - Acetoacetyl-CoA reductase (EC 1.1.1.36) from various bactenal species (gene 
phbB or phiBTciucose 1 -dehydrogenase (EC 1.1.1.47) from Bacillus. - 3-beta-hydroxysteroid dehydrogenase (EC 
111 51 ) from Comomonas testosteroni. - 20-beta-hydroxysteroid dehydrogenase (EC ^^^^) from Strep^omyces 
i^tS^nans. - Ribitol dehydrogenase (EC 1.1.1.56 ) (RDH) from Klebsiella aerogenes. - Estradiol 17-betaKlehydro- 
genase (EC 1.1.1.62) from human. - Gluconate 5-dehydrogenase (EC 1-1. 1-69) from Gluconobacter oxydans (gene 
gno). - 3K.xoi^;K^l-carrier protein] reductase (EC 1.1.1.100 ) from Escherichia coli (gene fabG) and fro"] P^^-j'^- " 
Retinol dehydrogenase (EC 1.1.1.105) from mammals. - 2-deoxy-d-gluconate ^dehydrogenase (BCL^J^ from 
Escherichia coli and Enwinia chrysanthemi (gene kduD). - Sort>itol-6-phosphate 2-dehydrogenase (EC IJJJiO) from 
Escherichia coli (gene gutD) and from Klebsiella pneumoniae (gene sorD). - 15-hydroxyprostag and.n dehydrogenase 
(NAD+) (EC 1 1.1.141) from human. - Corticosteroid 11-beta-dehydrogenase (EC 1.1.1.146) (11-DH) from mammate 

- 7-alpha-hydroxysteroid dehydrogenase (EC 1.1.1.159 ) from Escherichia coli (gene ^'^^Jff^f'''^^^^'''^* 
12708 (gene baiA) and from Clostridium sordellii. - NADPH^ependent carbonyl reductase (EC 1.1.1.184 ) from mam- 
mals. - Tropinone reductase-1 (EC 1.1 .1.206) and -11 (EC 1.1.1.236) from plants. - N-acylmannosamine -dehydroge- 
nase (EC 1 1 1 233) from Flavobacterium strain 141-8. - D-arabinitol 2-dehydrogenase (ribulose forming) (EC 
1 1 1 250) from fungi. - Tefrahydroxynaphthalene reductase (EC 1.1.1.252) from Magnaporthe grisea^ - Reridine re- 
ductase 1 (EC 1 1 1 253) (gene PTR1) from Leishmania. - 2,5-dichloro-2,5-cyclohexadiene-1,4-diol dehydrogenase 
(EC 1 1 - -) from Pseudomonas paucimobilis. - Cis-1.2-dihydroxy-3,4-cyclohexadiene-1-carboxylate dehydrogenase 
(EC 1 3 1 -) from Acinetobacter calcoaceticus (gene benD) and Pseudomonas putida (gene xylL). - Biphenyl-2.3-di- 
hydro^2;3ldiol dehydrogenase (EC 1 .3.1 .-) (gene bphB) from various Pseudomonaceae. - Cis4oluene d'hydrodiol de- 
hydrogenase (EC 1 .3.1.-) from Pseudomonas putida (gene todD). - Cis-benzene glycol dehydrogenase (EC 13J^ 
from Pseudomonas putida (gene bnzE). - 2,3-dihydro-2.3-dihydroxybenzoate dehydrogenase (fC1^J|5) from Es- 
cherichia coli (gene entA) and Bacillus subtilis (gene dhbA). - Dihydropteridine reductase (EC 1.6.99.7 ) (HDHPR) from 
mammals. - Lignin degradation enzyme ligD from Pseudomonas paucimobilis. - Agropine synthes Reductase from 
Agrobacterium plasmids (gene masi). - Versicolorin reductase from Aspergillus parasiticus (gene VER1)^ - Putative 
keto-acyl reductases from Streptomyces polyketide biosynthesis operons. - A trifunctional hydratase^ehydrogeriase- 
epimerase from the peroxisomal betaoxidation system of Candida fropicalis. This protein contains two tandemly re- 
peated -short-chain dehydrogenase-type- domain in its N-temninal extremity. - Nodulation P™*®'" "°j*f ^^'"JPf ^ 
of Azospirillum and Rhizobium which is probably involved in the modification of the nodulation Nod factor fatty acy^ 
chain - Nitrogen fixatton protein fixR from Bradyrtiizobium japonicum. - Bacillus subtilis protein dItE which is involved 
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in the biosynthesis of D- aianyl-lipoteichoic acid. - Human follicular variant translocation protein 1 (FVT1). - Mouse 
adipocyte protein p27. - Mouse protein Ke 6, - Maize sex determination protein TASSELSEED 2. - Sarcophaga pere- 
grine 25 Kd development specific protein. - Drosophlla fat body protein P6. • A Listeria monocytogenes hypothetical 
protein encoded in the intemallns gene region. - Escherichia coli hypothetical protein yciK. - Escherichia coll hypothet- 
5 ical protein ydfG. • Escherichia coli hypothetical protein yjgl. - Escherichia coli hypothetical protein yjgU. - Escherichia 
coli hypothetical protein yohF. - Bacillus subtiiis hypothetical protein yoxD. - Bacillus subtilis hypothetical protein ywfD. 

- Bacillus subtilis hypothetical protein ywfH. - Yeast hypothetical protein YIL124w. - Yeast hypothetical protein YIR035c. 

- Yeast hypothetical protein YIR036c. - Yeast hypothetical protein YKL055c. - Fission yeast hypothetical protein 
SpAC23D3.11. One of the best conserved regions which includes two perfectly conserved residues, a tyrosine and a 

10 lysine has been used as a signature pattern for this family of proteins. The tyrosine residue participates in the catalytic 
mechanism. 

Consensus pattern: [LIVSPADNK]-x(12)-Y-[PSTAGNCV]-[STAGNQCIVM]-[STAGC]- K- {PCHSAGFYR]-[LIVM- 
STAGD]-x(2)-[LIVMFYW]-x(3)- [LIVMFYWGAPTHQ]- [GSACQRHM] [Y is an active site residue] 

*5 [1] Joernvall H., Persson B., Krook M., Atrian S., Gonzalez-Duarte R., Jeffery J., Ghosh D. Biochemistry 34: 

6003-6013(1995). 

[ 2] Villarroya A., Juan E.. Egestad B., Joernvall H. Eur J. Biochem. 180:191-197(1989). 
[ 3] Persson B.. Krook M.. Joemvall H. Eur. J. Biochem. 200:537-543(1991). 

[ 4] Neidle E.L., Hartnett C, Ornston N.L.. Bairoch A.. Reklk M., Harayama S. Eur. J. Biochem. 204:113-120(1992). 

20 

[0238] 46. (adh_zinc) Zinc-containing alcohol dehydrogenases signatures Alcohol dehydrogenase (EC 1.1.1.1 ) 
(ADH) catalyzes the reversible oxidation of ethanol to acetaldehyde with the concomitant reduction of NAD [1]. Currently 
three, structurally and catalytically, different types of alcohol dehydrogenases are known: - Zinc-containing 'long-chain' 
alcohol dehydrogenases. - Insect-type, or 'short-chain' alcohol dehydrogenases. - Iron-containing alcohol dehydroge- 

25 nases.Zinc-containing ADH's [2,3] are dimeric or tetrameric enzymes that bind two atoms of zinc per subunit. One of 
the zinc atom is essential for catalytic activity while the other Is not. Both zinc atoms are coordinated by either cysteine 
or histldlne residues; the catalytic zinc Is coordinated by two cysteines and one histidine. Zinc-containing ADH's are 
found In bacteria, mammals, plants, and in fungi. In most species there are more than one isozyme (for example, 
human have at least six isozymes, yeast have three, etc.). A number of other zinc-dependent dehydrogenases are 

30 closely related to zinc ADH [4], these are: - Xylitol dehydrogenase (EC 1.1.1.9 ) (D-xylulose reductase). - Sorbitol de- 
hydrogenase (EC 1.1.1.14 ). - Aryl-alcohol dehydrogenase (EC 1.1.1.90 ) (benzyl alcohol dehydrogenase). - Threonine 
3-dehydrogenase (EC 1.1.1.103 ). - Cinnamyl-alcohol dehydrogenase (EC 1.1.1.195) (CAD) [5]. CAD is a plant enzyme 
involved in the biosynthesis of lignin. - Galactitol-1 -phosphate dehydrogenase (EC 1.1.1.251) . - Pseudomonas putida 
5-exo-alcohol dehydrogenase (EC 1.1.1.-) [6]. - Escherichia coli starvation sensing protein rspB. - Escherichia coll 

35 hypothetical protein yjgB. - Escherichia coli hypothetical protein yjgV. - Escherichia coll hypothetical protein yJjN. - Yeast 
hypothetical protein YAL060w (FUN49). - Yeast hypothetical protein YALOBIw (FUN50). - Yeast hypothetical protein 
YCR105W. The pattern that has been developed to detect this class of enzymes is based on a conserved region that 
includes a histidine residue which is the second ligand of the catalytic zinc atom. This family also includes NADP- 
dependent quinone oxidoreductase (EC 1.6.5.5 ),an enzyme found in bacteria (gene qor), in yeast and in mammals 

40 where, in some species such as rodents, it has been recruited as an eye lens protein and is known as zeta-crystallln 
[7]. The sequence of quinone oxidoreductase is distantly related to that other zinc-containing alcohol dehvdrogenases 
and it lacks the zinc-ligand residues. The torpedo ftsh and mammlian synaptic vesicle membrane protein vat-1 is related 
to qor. A specific pattem has been developed for this subfamily. 
Consensus pattern: G-H-E-x(2)-G-x(5)-[GA]-x(2)-[IVSAC] [H is a zinc ligand] 

45 Consensus pattern: [GSD]-[DEQH].x(2)-L-x(3)-[SA](2)-G-G-x-G-x(4)-Q-x(2)-[KR]- 

[ 1] Branden C.-l., Joernvall H., Ekiund H., Fumgren B. (In) The Enzymes (3rd edition) 11:104-190(1975). 
[ 2] Joemvall H.. Persson B., Jeffery J. Eur. J. Biochem. 167:195-201(1987). 
[ 3] Sun H.-W.. Plapp B.V. J. Mol. Evol. 34:522-535(1992). 
50 [4] Persson B., Hallborn J., Walfridsson M.. Hahn-Haegerdal B., Keraenen S., Penttilae M.. Joemvall H. FEBS 

Lett. 324:9-14(1993). 

( 5] Knight M.E., Halpin C, Schuch W. Plant Mol. Biol. 19:793-801(1992). 

[ 6] Koga H., Aramaki H., Yamaguchi E., Takeuchi K., Horiuchi T, Gunsalus I.C. J. Bacteriol. 166:1089-1095(1986). 
[ 7] Joemvall H.. Persson B.. Du Bois G., Lavers G.C.. Chen J.H.. Gonzalez P.. Rao PV., Zigler J.S. Jr. FEBS Lett. 
55 322:240-244(1993). 

[0239] 47. (aldedh) Aldehyde dehydrogenases active sites 

[0240] Aldehyde dehydrogenases (EC 1.2.1.3 and EC 1.2.1.5 ) are enzymes which oxidize a wide variety of aliphatic 
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and aromatic aldehydes. In mammals at least four different forms of the enzyme are known [1): class-1 (or Aid C) a 
tetramericcvtosolic enzyme, class-2 (or Aid M) a tetrameric mitochondrial enzyme, class-3 (or Aid D) a dimenc cytosolic 
enzyme and class IV a microsomal enzyme. Aldehyde dehydrogenases have also been sequencea trom rungai anu 
bacterial species. A number of enzymes are known to be evolutionary related to aldehyde dehydrogenases; these 
enzymes are listed below. - Plants and bacterial betaine-aldehyde dehydrogenase (EClgJ^) [2], an enzyme that 
catalyzes the last step in the biosynthesis of betaine.- Plants and bacterial NADP-dependent glyceraldehyde-3-phos- 
phate dehydrogenase (EC 1.2.1.9 ). - Escherichia coli succinate-semialdehyde dehydrogenase (NADP+) (EC 1.2.1.16 ) 
(gene gabD) [3], which reduces succinate semialdehyde into succinate. - Escherichia coli lactaldehyde dehydrogenase 
(EC 1 2 1 22 ) (gene aid) [4]. - Mammalian succinate semialdehyde dehydrogenase (NAD+) (EC 1.2.1 .24). - Eschenchia 
coli phenylacetaldehyde dehydrogenase (EC 1.2.1.39 ). - Escherichia coli 5-carboxymethyl-2-hydroxymuconate sem- 
ialdehyde dehydrogenase (gene hpcC). - Pseudomonas putida 2-hydroxymuconic semialdehyde dehydrogenase [5] 
(genes dmpC and xylG), an enzyme in the meta-cleavage pathway for the degradation of phenols, cresols and catechol. 

- Bacterial and mammalian methylmalonate-semialdehyde dehydrogenase (MMSDH) (EC 1.2.1.27) [6], an enzyme 
involved in the distal pathway of valine catabolism. - Yeast delta- 1-pyrroline-5-carboxylate dehydrogenase (EC 
1 5 1 12) [7] (gene PUT2). which converts proline to glutamate. - Bacterial multifunctional putA protein, which contains 
a delta-1-pyrroline- 5-carboxylate dehydrogenase domain. - 26G. a garden pea protein of unknown function which is 
induced by dehydration of shoots [8]. - Mammalian formyltetrahydrofolate dehydrogenase (EC 1.5.1.6) [9]. Thrs is a 
cytosolic enzyme responsible for the NADP-dependent decarboxylative reduction of 1 0-formyltetrahydrofolate into tet- 
rahydrofolate. It is an protein of about 900 amino acids which consist of three domains; the C- terminal domain (480 
residues) is structurally and functionally related to aldehyde dehydrogenases. - Yeast hypothetical protein YBR006w 

- Yeast hypothetical protein YER073w. - Yeast hypothetical protein YHR039c. - Caenorhabditis elegans hypothetical 
protein F01F1.6.A glutamic acid and a cysteine residue have been implicated in the catalytic activity of mammalian 
aldehyde dehydrogenase. These residues are conserved in all the enzymes of this family Two pattems have been 
derived for this family, one for each of the active site residues. 

Consensus pattern: [LIVMFGA]-E-[LIMSTAC]-[GS]-G-[KNLM]-[SADN]-[TAPFV] [E is the active site residue]- 
Consensus pattem: [FYLVA]-x(3)-G-[QE]-x-C-[LIVMGSTANC]-[AGCN]-x.[GSTADNEKR] [C is the active site residue 

[ 1] Hempel J., Harper K., Lindahl R. Biochemistry 28:1160-1167(1989). 

[ 21 Weretilnyk E.A.. Hanson A.D. Proc. Natl. Acad. Sci. U.S.A. 87:2745-2749(1990). 

[ 3] Niegemann E., Schuiz A.. Bartsch K. Arch. Microbiol. 160:454-460(1993). 

[4] Hidalgo E.. Chen Y.-M., Lin E.C.C., Aguilar J. J. Bacteriol. 173:6118-6123(1991). 

[5] Nordlund I., Shingler V. Biochim. Biophys. Acta 1049:227-230(1990). 

[ 6] Steele M.I.. Lorenz D., Hatter K., Park A., Sokatch J.R. J. Biol. Chem. 267:13585-13592(1992). 
[ 7] Krzywicki K.A.. Brandriss M.C. Mol. Cell. Biol. 4:2837-2842(1984). 
[ 8] Guen-ero F.D.. Jones J.T., Mullet J.E. Plant Mol. Biol. 15:11-26(1990). 
[ 9] Cook R.J.. Lloyd R.S.. Wagner C. J. Biol. Chem. 266:4965-4973(1991). 

[0241] 48. Aldo/keto reductase family signatures 

The aldo-keto reductase family [1 .2] groups together a number of structurally and functionally related NADPH-depend- 
ent oxidoreductases as well as some other proteins. The proteins known to belong to this family are: - Aldehyde re- 
ductase (EC 1.1.1.2 ). - Aldose reductase (EC 1.1.1.21 ). - 3-alpha-hydroxysteroid dehydrogenase (EC 1.1.1-50). which 
terminates androgen action by converting 5-alpha-dihydrotestosterone to 3-alpha-androstanedioL - Prostaglandin F 
synthase (EC 1.1.1.188 ) which catalyzes the reduction of prostaglandins H2 and D2 to F2-alpha. - D-sorbitol-6-phos- 
phate dehydrogenase (EC 1.1.1.200) from apple. - Morphine 6-dehydrogenase (EC 1.1.1.218) from Pseudomonas 
putida piasmid pMDH7.2 (gene morA). - Chlordecone reductase (EC 1.1.1.225 ) which reduces the pesticide chlo- 
rdecone (kepone) to the corresponding alcohol. - 2,5-diketo-D-gluconic acid reductase (EC 1.1.1.-) which catalyzes 
the reduction of 2.5-diketogluconic acid to 2-keto-L-gulonic acid, a key intermediate in the production of ascort>ic acid. 
- NAD(P)H-dependent xylose reductase (EC 1.1.1.-) from the yeast Pichia stipitis. This enzyme reduces xylose into 
xylit - Trans-1 .2-dihydrobenzene-1 .2-diol dehydrogenase (EC 1.3.1.20) . - 3-oxo-5-beta-steroid 4-dehydrogenase (EC 
1 3.99 6) which catalyzes the reduction of delta(4)-3-oxosteroids. - A soybean reductase, which co-acts with chalcone 
synthase in the formation of 4,2',4*-trihydroxychalcone. - Frog eye lens rho crystallin. - Yeast GCY protein, whose 
function is not known. - Leishmania major P110/11E protein. P110/11E is a developmentally regulated protein whose 
abundance is mari<edly elevated in promastigotes compared with amastigotes. Its exact function is not yet known. - 
Escherichia coli hypothetical protein yafB. - Escherichia coli hypothetical protein yghE, - Yeast hypothetical protein 
YBR149W. - Yeast hypothetical protein YHR104w. - Yeast hypothetical protein YJR096w.These proteins have all about 
300 amino acid residues. Three consensus pattems have been developed that are specific to this family of proteins. 
The first one is located In the N-temiinal section of these proteins. The second pattern is located in the central section. 
The third pattem. located in the C-terminal. is centered on a lysine residue whose chemical modification, in aldose and 
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aldehydereductases, affect the catalytic efficiency. 

Consensus pattern: G-[FY]-R-tHSAL]-[LIVMFl-D-ISTAGC]-[AS]-x(5)-E-x(2)-[LIVM]- G- 
Consensus pattern: (LIVMFY)-x(9)-[KREQ].x-[UVM]-G-[LIVM]-[SC)-N-[FY]- 

Consensus pattern: (LIVM]-[PAIV]-[KR]-[ST]-x(4)-R-x(2)-[GSTAEQK]-[NSL]-x(2)-[LIVMFAl [K is a putative active site 
5 residue]- 

[ 1] Bohren K.M., Bullock B., Wermuth B.. Gabbay K.H. J. Biol. Chem. 264:9547-9551(1989). 
[ 2] Bruce N.C., Willey D.L., Coulson A.F.W.. Jeffery J. Biochem. J. 299:805-811(1994). 

10 [0242] 49. Alpha amylase. This family is classified as family 13 of the glycosyl hydrolases. The structure is an 8 
stranded alpha/beta barrel, interrupted by a -70 a.a. calcium-binding domain protruding between beta strand 3 and 
alpha helix 3. and a carboxyl-terminal Greek key beta-barrel domain. 

[1] Larson SB, Greenwood A, Cascio D. Day J, McPherson A. J Mol Biol 1994;235:1560-1584. 
[0243] 50. Aminotransferases class-l pyridoxal-phosphate attachment site 

IS Aminotransferases share certain mechanistic features with other pyridoxal- phosphate dependent enzymes, such as 
the covalent binding of the pyridoxal- phosphate group to a lysine residue. On the basis of sequence similarity, these 
various enzymes can be grouped [1,2] into subfamilies. One of these, called class-l. currently consists of the following 
enzymes: - Aspartate aminotransferase (AAT) (EC 2.6.1.1) . AAT catalyzes the reversible transfer of the amino group 
from L-aspartate to 2-oxoglutarate to form oxaloacetate and L-glutamate. In eukaryotes. there are two AAT isozymes: 

20 one is located in the mitochondrial matrix, the second is cytoplasmic. In prokaryotes, only one form of AAT is found 
(gene aspC). - Tyrosine aminotransferase (EC 2.6.1.5 ) which catalyzes the first step in tyrosine catabolism by reversibly 
transferring its amino group to 2- oxoglutarate to form 4-hydroxyphenylpyruvate and L-glutamate. - Aromatic ami- 
notransferase (EC 2.6.1.57 ) involved in the synthesis of Phe, Tyr, Asp and Leu (gene tyrB). - 1-aminocyclopropane- 
1-carboxylate synthase (EC 4.4.1.14) (ACC synthase) from plants. ACC synthase catalyzes the first step in ethylene 

25 biosynthesis. - Pseudomonas denitrificans cobC, which is involved in cobalamin biosynthesis. - Yeast hypothetical 
protein YJL060w.The sequence around the pyridoxal-phosphate attachment site of this class of enzyme is sufficiently 
conserved to allow the creation of a specific pattern. 

Consensus pattern: [GS]-[LIVMFYTAC]-[GSTA]-K-x(2)-[GSALVN]-[LIVMFA]-x-[GNAR]-x-R-ILIVMA]-IGA] [K is the py- 
ridoxal-P attachment site] 

30 

[ 1] Bairoch A. Unpublished observations (1992). 

[ 2] Sung M.H,. Tanizawa K., Tanaka H., Kuramitsu S.. Kagamiyama H., Hirotsu K., Okamoto A., Higuchi T., Soda 
K. J. Biol. Chem. 266:2567-2572(1991). 

35 [0244] 51. Aminotransferases class-ll pyridoxal-phosphate attachment site 

Aminotransferases share certain mechanistic features with other pyridoxal- phosphate dependent enzymes, such as 
the covalent binding of the pyridoxal- phosphate group to a lysine residue. On the basis of sequence similarity, these 
various enzymes can be grouped [1] into subfamilies. One of these, called class-ll, currently consists of the following 
enzymes: - Glycine acetyltransferase (EC 2.3.1.29 ), which catalyzes the addition of acetyl-CoA to glycine to form 

40 2-amino-3-oxobutanoate (gene kbi). - 5-aminolevulinic acid synthase (EC 2.3.1.37 ) (delta-ALA synthase), which cat- 
alyzes the first step in heme biosynthesis via the Shemtn (or C4) pathway, i.e. the addition of succinyl-CoA to glycine 
to form 5- aminolevulinate. - 8-amlno-7-oxononanoate synthase (EC 2.3.1.47) (7-KAP synthetase), a bacterial enzyme 
(gene bioF) which catalyzes an intermediate step in the biosynthesis of biotin: the addition of 6-carboxy-hexanoyl-CoA 
to alanine to form 8-amino-7-oxononanoate. - Histidinol-phosphate aminotransferase (EC 2.6.1.9 ). which catalyzes 

45 the eighth step in histidine biosynthetic pathway: the transfer of an amino group from 3-<imidazol-4-yl)-2-oxopropyl 
phosphate to glutamic acid to form histidinol phosphate and 2-oxoglutarate. - Serine palmitoyltransferase (EC 2.3.1.50) 
from yeast (genes LCB1 and LCB2). which catalyzes the condensation of palmitoyl-CoA and serine to form 3-ketosph- 
inganine.The sequence around the pyridoxal-phosphate attachment site of this class of enzyme is sufficiently con- 
served to allow the creation of a specific pattern 

50 Consensus pattern: T-[UVMFYW]-[STAG]-K-[SAG]-[LIVMFYWR]-[SAG]-x(2)-[SAG] 
[K is the pyridoxal-P attachment site]- 
[ 1] Bairoch A. Unpublished observations (1991). 

[0245] 52. Aminotransferases class-Ill pyridoxal-phosphate attachment site 

Aminotransferases share certain mechanistic features with other pyridoxal- phosphate dependent enzymes, such as 
55 the covalent binding of the pyridoxal- phosphate group to a lysine residue. On the basis of sequence similarity, these 
various enzymes can be grouped [1 ,2] into subfamilies. One of these, called class-til, currently consists of the following 
enzymes: - Acetylomithine aminotransferase (EC 2.6.1.11) which catalyzes the transfer of an amino group from acety- 
lornithine to alpha-ketoglutarate. yielding N-acetyl-glutamic-5-semi-aldehyde and glutamic acid. - Ornithine ami- 
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notransferase (EC 2.6.1.13 ), which catalyzes the transfer of an amino group from ornithine to alpha-ketoglutarate, 
yielding glutamic-5- semi-aldehyde and glutamic acid. - Omega-amino acid-pyruvate aminotransferase (EC2.6.1.18), 
which catalyzes transamination between a variety of omega-amino acids, mono- and diamines, and pyruvate. It piays 
a pivotal role in omega amino acids metabolism. - 4-aminobutyrate aminotransferase (EC 2.6.1.19) (GABA transami- 

5 nase), which catalyzes the transfer of an amino group from GABA to alpha-ketoglutarate. yielding succinate semial- 
dehyde and glutamic acid. - DAPA aminotransferase (EC 2.6.1.62) , a bacterial enzyme (gene bioA) which catalyzes 
an intermediate step in the biosynthesis of biotin, the transamination of 7-keto-8-aminopelargonic acid (7-KAP) to form 
7,8- diaminopelargonic acid (DAPA). - 2,2-dialkylglycine decarboxylase (EC 4.1.1.64 ), a Pseudomonas cepacia en- 
zyme (gene dgdA) that catalyzes the decarboxylating amino transfer of 2,2-dialkylglycine and pyruvate to dialkyi ketone. 

10 alanine and carbon dioxide. - Glutamate-1-semialdehyde aminotransferase (EC 5.4.3.8 ) (GSA). GSA is the enzyme 
involved in the second step of porphyrin biosynthesis, via the C5 pathway It transfers the amino group on carbon 2 of 
gIutamate-1-semialdehyde to the neighbouring carbon, to give detta-aminolevulinic acid. - Bacillus subtilis aminotrans- 
ferase yhxA. - Bacillus subtilis aminotransferase yodX - Haemophilus influenzae aminotransferase HI0949. - 
Caenorhabditis elegans aminotransferase T01B11,2.The sequence around the pyridoxal-phosphate attachment site 

15 of this class ofenzyme is sufficiently conserved to allow the creation of a specific pattern. 

Consensus pattern: [LI VMFYWCl(2)-x-D-E-[l VA]-x(2)-G-[LI VMFAGC]-x(0, 1 )-[RSACLI]-x-[GSAD]-x(1 2. 1 6)-D-[LI VM- 

FC]-[LIVMFYSTA]-x(2)- [GSA]-K-x(3)-[GSTADNV]-[GSAC] [K is the pyridoxal-P attachment site]- 

[ 1] Bairoch A. Unpublished observations (1992).[ 2] Yonaha K., Nishie M.. Aibara S. J. Biol. Chem. 267:12506-12510 

(1992). 

20 [0246] 53. Ank repeat. There's no clear separation between noise and signal on the HMM search Ankynn repeats 
generally consist of a beta, alpha, alpha, beta order of secondary structures. The repeats associate to form a higher 
order structure. 

[1] A. HolakTA, FEBS Lett 1997:401:127-132. 
25 [2] Lux SE, John KM. Bennett V, Nature 1990;345:736-739. 

[0247] 54. Aminotransferases class-IV signature 

[0248] Aminotransferases share certain mechanistic features with other pyridoxal-phosphate dependent enzymes, 
such as the covalent binding of the pyridoxal-phosphate group to a lysine residue. On the basis of sequence similarity, 
30 these various enzymes can be grouped [1 ,2] into subfamilies. One of these, called class-IV. cun-ently consists of the 
following enzymes: 

- Branched-chain amino-acid aminotransferase (EC 2.6.1.42) (transaminase B). a bacterial (gene ilvE) and eukary- 
otic enzyme which catalyzes the reversible transfer of an amino group from 4-methyl-2-oxopentanoate to gluta- 

35 mate, to form leucine and 2-oxoglutarate. 

- D-alanine aminotransferase (EC 2.6.1.21) . A bacterial enzyme which catalyzes the transfer of the amino group 
from D-alanine (and other D-amino acids) to 2-oxoglutarate, to form pyruvate and D-aspartate. 

- 4-amino-4-deoxychorismate (ADC) lyase (gene pabC). A bacterial enzyme that converts ADC into 4-aminoben- 
zoate (PABA) and pyruvate. 

40 

[0249] The above enzymes are proteins of about 270 to 41 5 amino-acid residues that share a few regions of sequence 

similarity. Surprisingly, the best-conserved region does not include the lysine residue to which the pyridoxal-phosphate- 

group is known to be attached, in ilvE. The region that has been selected as a signature pattern is located some 40 

residues at the C-terminus side of the PIP-lysine 
45 Consensus pattern: E-x-[STAGCI]-x(2)-N-[LIVMFACl-[FYl-x(6,12)-[LIVMn-x-T-x(6,8)-[LIVM]-x-[GS]-[LIVM 

[1] Green J.M.. Merisel W.K., Nichols B.P J. Bacteriol. 174:5317-5323(1992). 
[2] Bairoch A. Unpublished observations (1992). 

50 [0250] 55. Aminotransferases class-V pyridoxal-phosphate attachment site 

Aminotransferases share certain mechanistic features with other pyridoxal- phosphate dependent enzymes, such as 
the covalent binding of the pyridoxal- phosphate group to a lysine residue. On the basis of sequence similarity, these 
various enzymes can be grouped [1 ,2] into subfamilies. One of these, called class-V. currently consists of the following 
enzymes: - Phosphoserine aminotransferase (EC 2.6.1.52 ), an enzyme which catalyzes the reversible interconversion 

55 of phosphoserine and 2-oxoglutarate to 3-phosphonooxypyruvate and glutamate. It is required both in the major phos- 
phorylated pathway of serine biosynthesis and in pyridoxine biosynthesis. The bacterial enzyme (gene serC) is highly 
similar to a rabbit endometrial progesterone-induced protein (EPIP). which is probably a phosphoserine aminotrans- 
ferase [3]. - Serine-glyoxylate aminotransferase (EC 2.6.1.45) (SGAT) (gene sgaA) from Methylobacterium ex- 
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torquens. - Serine— pyruvate aminotransferase (EC 2.6.1.51 ). This enzyme also acts as an alanine— glyoxylate ami- 
notransferase (EC 2.6.1.44) . In vertebrateSt it is located in the peroxisomes and/or mitochondria. - Isopenicillin N 
epimerase (gene cetD). This enzyme is involved in the biosynthesis of cephalosporin antibiotics and catalyzes the 
reversible isomerization of isopenicillin N and penicillin N. - NifS, a protein of the nitrogen fixation operon of some 

5 bacteria and cyanobacteria. The exact function of nifS is not yet known. A highly similar protein has been found in fungi 
(gene NFS1 or SPL1). - The small subunit of cyanobacterial soluble hydrogenase (EC 1.12.-.-). - Hypothetical protein 
ycbU from Bacillus subtilis. - Hypothetical protein YFLOSOwfrom yeast. The sequence around the pyridoxal-phosphate 
attachment site of this class of enzyme is sufficiently conserved to allow the creation of a specific pattern. 
Consensus pattern: [LIVFYCHTHDGH]-[LIVMFYAC]-[LIVMFYA]-x(2)-[GSTAC]-lGSTA]- [HQR]-K-x(4.6)-G-x-[GSATl- 

10 x-ILIVMFYSAC] [K is the pyridoxal-P attachment site]- 

[ 1] Ouzounis C, Sander C. FEBS Lett. 322:159-164(1993). 
[ 2] Bairoch A. Unpublished observations (1992). 

[ 3] van der Zel A., Lam H.-M., Winkler M.E. Nucleic Acids Res. 17:8379-8379(1989). 

IS 

[0251] 56. Annexins repeated domain signature 

Annexins [1 to 6] are a group of calcium-binding proteins that associate reversibly with membranes. They bind to 
phospholipid bilayers in the presence of micromolar free calcium concentration. The binding is specific for calcium and 
for acidic phospholipids. Annexins have been claimed to be involved in cytoskeletal interactions, phospholipase inhi- 

20 bition. intracellular signalling, anticoagulation, and membrane fusion. Each of these proteins consist of an N4erminal 
domain of variable length followed by four or eight copies of a conserved segment of sixty one residues. The repeat 
(sometimes known as an 'endonexin fold) consists of five alpha-helices that are wound into a right-handed superhelix 
[7].The proteins known to belong to the annexin family are listed below: - Annexin I (Lipocortin 1) (Calpactin 2) (p35) 
(Chromobindin 9). - Annexin II (Lipocortin 2) (Calpactin 1) (Protein I) (p36) (Chromobindin 8). - Annexin III (Lipocortin 

25 3) (PAP-III). - Annexin IV (Lipocortin 4) (Endonexin I) (Protein II) (Chromobindin 4). - Annexin V (Lipocortin 5) (Endon- 
exin 2) (VAC-alpha) (Anchorin CM) (PAP-I). - Annexin VI (Lipocortin 6) (Protein III) (Chromobindin 20) (p68) (p70). This 
is the only known annexin that contains 8 (instead of 4) repeats. - Annexin VII (Synexin). - Annexin VIM (Vascular 
anticoagulant-beta) (VAC-beta). - Annexin IX from Drosophtla. - Annexin X from Drosophila. - Annexin XI (Calcyclin- 
associated annexin) (CAP-50). - Annexin XII from Hydra vulgaris. - Annexin XIII (Intestine-specific annexin) (ISA).The 

30 signature pattern for this domain spans positions 9 to 61 of the repeatand includes the only perfectly conserved residue 
(an arginine in position 22)- 

Consensus pattern: [TG]-[STV]-x(8)-[LIVMF]-x(2)-R-x(3)-[DEQNH]-x(7)-[IFY]- x(7)-[LIVMF]-x(3)-[LI VMF]-x(1 1 )- 
[LIVMFA]-x(2)-[LIVMF]- 

35 1 1] Raynal P.. Pollard H.B. Biochim. Biophys. Acta 1197:63-93(1994). 

[ 2] Barton G.J., Newman R.H., Freemont P.S., Crumpton M.J. Eur. J. Biochem. 198:749-760(1991). 
[ 3] Burgoyne R.D., Geisow M.J. Cell Calcium 10:1-10(1989). 

[4] Haigler H.T. Fitch J.M., Jones J.M., Schlaepfer D.D. Trends Biochem. Sci. 14:48-50(1989). 
I 5] Klee C.B. Biochemistry 27:6645-6653(1988). 
40 [ 6] Smith P.D.. Moss S.E. Trends GeneL 10:241-246(1994). 

[ 7] Huber R.. Roemisch J., Paques E.-P. EMBO J. 9:3867-3874(1990). 
[ 8] Fiedler K.. Simons K. Trends Biochem. Sci. 20:177-178(1995). 

[0252] 57. (arf_1) ADP-ribosylation factors family signature 

45 ADP-ribosylation factors (ARF) [1 ,2,3.4] are 20 Kd GTP-binding proteins involved in protein trafficking. They may mod- 
ulate vesicle budding and uncoating within the Golgi apparatus. ARPs also act as allosteric activators of cholera toxin 
ADP-ribosyltransf erase activity. They are evolutionary conserved and present in all eukaryotes. At least six forms of 
ARF are present in mammals and three in budding yeast. The ARF family also includes proteins highly related to ARF's 
but which lack the cholera toxin cofactor activity, they are collectively known as ARL*s (ARF-like).ARDI is a 64 Kd 

50 mammalian protein of unknown biological function that contains an ARF domain at its C-terminal extremity. Proteins 
from the ARF family are generally included in the RAS 'superfamily* of small GTP-binding proteins [5J, but they are 
only slightly related to the other RAS proteins. They also differ from RAS proteins in that they lack cysteine residues 
at their C-termini and are therefore not subject to prenylation. The ARFs are N-terminally myristoylated (the ARLs have 
not yet been shown to be modified in such a fashion). A conserved region in the C-terminal part of ARF's and ARL*s 

55 has been selected as a signature pattern. 

Consensus pattem: [HRQT]-x-[FYWIl-x-[LIVM]-x(4)-A-x(2)-G-x(2)-[LIVM]-x(2).[GSA]-[LIVMF]-x.[WK].[LIVM]- 

Note: proteins belonging to this family also contain a copy of the ATP/GTP- binding motif 'A* (P-loop) (see <PDOC00017 
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1 1] Boman A.L.. Kahn R.A. Trends Biochem. Sci. 20:147-150(1995). 

f 2] Moss J., Vaughan M. Cell. Signal. 4.367-399(1993). 

i 3] Moss J., Vaughan M. Prog. Nucleic Acid Res. Mol. Biol. 45:47-65(ia93). 

[4] Amor J.C., Harrison D.H., Kahn R.A., Ringe D. Nature 372:704-708(1994). 

[5] Valencia A.. Chardin P., Wittinghofer A., Sander C. Biochemistry 30:4637-4648(1991). 

[02531 (arf 2) ATP/GTP-binding site motif A (P-loop) . 
From sequence comparisons and crystallographic data analysis it has been shown [1,2.3,4.5,6] that an appreciable 
proportion of proteins that bind ATP or GTP share a number of more or less conseived sequence motifs. The best 
«,nserved of these motifs is a glycine-rich region, which typically fomis a flexible loop between a beta^trand and an 
alpha-helix. This loop interacts with one of the phosphate groups of the nucleotide. This sequence motif is generally 
referred to as the 'A' consensus sequence [1] or the 'P-loop' [5].There are numerous ATP- or GTP-binding proteins m 
which the P-loop is found. A number of protein families for which the relevance of the presence of such motif has been 
noted are listed below: - ATP synthase alpha and beta subunits (see <PDOC00137>). - Myosin heavy chains. - Kinesin 
heavy chains and kinesin-like proteins (see <PDOC00343>). - Dynamins and dynamin4ike P^°te'"s (see 
<PDOC00362>). - Guanylate kinase (see < PDOC00670 >). - Thymidine kinase (see < PDOC00524 >). - Thyrnidylate 
kinase (see <PDOC01034 >). - Shikimate kinase (see < PDOC00868 >). - Nitrogenase iron protein ^a-^i'V (";™^) 
(see <PDOC00580>). - ATP-binding proteins involved in 'active transport' (ABC transporters) [7] (see <PDOC00185 >). 
- DNAandRNAhelicases [8.9.1 0]. - GTP-binding elongation factors (EF-Tu. EF-1 alpha, EF-G. etcO- Ras famjy 
of GTP-binding proteins (Ras. Rho. Rab. Ral. Yptl. SEC4. etc.). - Nuclear protein ran (see <PDOC00e59 >)^ - ADP- 
ribosylation fairs family (see <PDOC00781 >). - Bacterial dnaA protein (see <PDOC00771>). - Bactenal recA protein 
(see <PDOC00131>) - Bacterial recF protein (see <PDOC00539 >). - Guanine nucleotide-binding proteins alpha sub- 
units (Gi Gs Gt GO. etc.). - DNA mismatch repair proteins mutS family (See <PDOC00388>). - Bactenal type M 
secretion system protein E (see <PDOC00567>).Not all ATP- or GTP-binding proteins are picked-up by th's jnotit A 
number of proteins escape detection because the structure of their ATP-binding site is completely different from that 
of the P-loop. Examples of such proteins are the E1 -E2 ATPases or the glycolytic kinases. In other ATP-or GTP-binding 
proteins the flexible loop exists in a slightly different fomn; this is the case for tubulins or protein kinases. A specjj 
mention must be reserved for adenylate kinase, in which there is a single deviation from the P-loop pattern: in the last 
position Giy is found instead of Ser or Thr. 
Consensus pattern: [AG]-x(4)-G-K-[STl- 

[ 1] Walker J.E., Saraste M., Runswick M.J., Gay N.J. EMBO J. 1:945-951(1982). 
[2] MollerW.. Amons R. FEBS Lett. 186:1-7(1985). 

[ 3] Fry D C Kuby S.A.. Mildvan A.S. Proc. Natl. Acad. Sci. U.S.A. 83:907-911(1986). 
[4] DeverTE.. Glynias M.J., Merrick W.C. Proc. Natl. Acad. Sci. U.S.A 84:1814-1818(1987). 
[5] Saraste M., Sibbald P.R.. Wittinghofer A. Trends Biochem. Sci. 15:430-434(1990). 
r 61 KooninE.V J. Mol. Btol. 229:1165-1174(1993). 

[7] Higgins C.F., Hyde S.C., Mimmack M.M.. Gileadi U.. Gill D.R., Gallagher M.P. J. Bioenerg. Biomembr. 22. 
571-592(1990). 

[ 8] Hodgman TC. Nature 333:22-23(1988) and Nature 333:578-578(1988) (En^ta). 

[ 9] Linder P.. Lasko R. Ashburner M.. Leroy R, Nielsen RJ., Nishi K.. Schnier J.. Slonimski RP Nature 337:121-122 
f 1989) 

[10] Gorbatenya A.E.. Koonin E.V.. Donchenko A.R. Blinov V.M. Nucleic Acids Res. 17:4713-4730(1989). 

r02541 58. Arginase family signatures ^ ^. 

The following enzymes have been shown [1] to be evolutionary related: - Arginase (EC f "b'q"'tous enzyme 

which catalyzes the degradation of arginine to ornithine and urea [2]. - Agmatinase (ECM:3J1) (agmatne ureohy- 

drolase). a prokaryotic enzyme (gene speB) that catalyzes the hydrolysis of agmatine into putrescine and urea. - 

Formiminoglutamase (EC 3.5.3.8) (fomniminoglutamate hydrolase), a prokaryotic enzyme (gene hutG) that hydrolyzes 

N-formimino^lutamate into glutamate and fomiamide. - Hypothetical proteins from methanogenic archaebartena. 

These enzymes are proteins of about 300 amino^cid residues. Three conserved regions that contain charged residues 

which are involved in the binding of the two manganese ions [3] can be used as signature pattems.- 

Consensus pattern: [LIVMF]-G-G-x-H-x-[LIVMT|-[STAV]-x-[PAG]-x(3HGSTA] [H binds manganese]- 

Consensus pattern: [LlVM](2)-x-[LIVMFY]-D-[AS]-H-x-D [The two D's and the H bind manganese]- 

Consensus pattern: [ST|-[LIVMFY]-D-[LIVM]-D-x(3)-[PAQ]-x(3)-P-[GSAl-x(7)-G {The two D's bind manganese] 

[ 1] Ouzounis C, Kyrpides N.C. J. Mol. Evol. 39:101-104(1994). 

[2] Jenkinson CP., Grody W.W., Cederbaum S.D. Comp. Biochem. Physiol. 1148:107-132(196). 
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[ 3] Kanyo Z.F.. Scolnick L.R.. Ash D.E., Christiansen D.W. Nature 383:554-557(1996). 
[0255] 59. (asp) Eukaryotic and viral aspartyl proteases active site 

Aspartyt proteases, also known as acid proteases, (EC 3.4.23.-) are a widely distributed family of proteolytic enzymes 

5 [1 .2.3] known to exist invertebrates, fungi, plants, retroviruses and some plant viruses. Aspartate proteases of eukary- 
otes are monomeric enzymes which consist of two domains. Each domain contains an active site centered on a catalytic 
aspartyl residue.The two domains most probably evolved from the duplication of an ancestral gene encoding a primor- 
dial domain. Currently known eukaryotic aspartyl proteases are: - Vertebrate gastric pepsins A and C (also known as 
gastricsin). - Vertebrate chymosin (rennin), involved in digestion and used for making cheese. - Vertebrate lysosomal 

10 cathepsins D (EC 3.4.23.5) and E (EC 3.4.23.34) . - Mammalian renin (EC 3.4.23.15) whose function is to generate 
angiotensin I from angiotensinogen in the plasma. - Fungal proteases such as aspergillopepsin A (EC 3.4.23.18 ), 
candidapepsin (EC 3.4.23.24) , mucoropepsin (EC 3.4.23.23 ) (mucor rennin), endothiapepsin (EC 3.4.23.22 ). polypo- 
ropepsin (EC 3.4.23.29 ), and rhizopuspepsin (EC 3.4.23.21 ). - Yeast saccharopepsin (EC 3.4.23.25 ) (proteinase A) 
(gene PEP4). PEP4 is implicated in posttranslational regulation of vacuolar hydrolases. - Yeast barrier pepsin (EC 

15 3.4.23.35 ) (gene BAR 1 ); a protease that cleaves alpha-factor and thus acts as an antagonist of the mating pheromone. 
- Fission yeast sxa1 which is involved in degrading or processing the mating pheromones. Most retroviruses and some 
plant viruses, such as badnaviruses, encode for anaspartyl protease which is an homodimer of a chain of about 95 to 
125 amino acids. In most retroviruses, the protease Is encoded as a segment of apolyprotein which is cleaved during 
the maturation process of the virus. It is generally part of the pol polyprotein and, more rarely, of the gagpolyprotein. 

20 Conservation of the sequence around the two aspartates of eukaryotic aspartyl proteases and around the single active 
site of the viral proteases allows us to develop a single signature pattern for both groups of protease. 
Consensus pattern: [L1VMFGACHLIVMTADN]-[LIVFSA]-D-[ST]-G'[STAV1-[STAPDENQ]- x-[LIVMFSTNC]-x-(LIVMF- 
GTA] [D Is the active site residue] Note: these proteins belong to families A1 and A2 in the classification of peptidases 
[4.E1 

25 

1 1] Foltmann B. Essays Biochem. 17:52-84(1981). 

[2] Davies D.R. Annu. Rev. Biophys. Chem. 19:189-215(1990). 

[ 3] Rao J.K.M., Ericksc.i J.W.. Wlodawer A. Biochemistry 30:4663-4671(1991). 

[4] Rawlings N.D., Barrett A.J. Meth. Enzymol. 248:105-120(1995). 

30 

[0256] 60. (BIRA) Biotin repressor 

[1] Wilson KP, Shewchuk LM, Brennan RG, Otsuka AJ. Matthews BW; Proc Natl Acad Sci USA 1992;89:9257-9261. 
[0257] 6 1 . BTB/POZ domain 

The BTB (for BR-C, ttk and bab) [1] or P02 (for Pox virus and Zinc finger)[2] domain is present near the N-terminus 
35 of a fraction of zinc finger 

(zf-C2H2 ) proteins and in proteins that contain the Kelch motif 

such as Kelch and a family of pox virus proteins. The BTB/POZ domain mediates homomeric dimerisation and in some 
instances heteromeric dimerisation [2]. The structure of the dimerised PLZF BTB/POZ domain has been solved and 
consists of a tightly intertwined homodimer. The central scaffolding of the protein is made up of a cluster of alpha- 
40 helices flanked by short beta-sheets at both the top and bottom of the molecule [3]. POZ domains from several zinc 
finger proteins have been shown to mediate transcriptional repression and to interact with components of histone 
deacetylase co-repressor complexes including N-CoR and SMRT [4,5,6]. The POZ or BTB domain Is also known as 
BR-C/Ttk orZiN 

45 [1] Zollman S. Godt D, Prive GG, Couderc JL. Laski FA; Proc Natl Acad Sd U S A 1994;91:10717-10721, 

[2]Bardwell VJ, Treisman R; Genes Dev 1994;8:1664-1677. 

[3] Ahmad KF, Engel CK, Prive GG; Proc Natl Acad Sci U S A 1998;95:12123-12128. 

[4] Deweindt C, Albagli O, Bemardin F, Dhordain P. Quief S. Lantoine D, Kerckaert JP, Leprince D; Cell Growth 
Differ 1995;6:1495-1503. 
50 [5] Huynh KD. Bardwell VJ; Oncogene 1998:17:2473-2484. 

[6] Wong CW. Privalsky ML; J Biol Chem 1998;273:27695-27702. 

[0258] 62. (Bac GSPproteins) Bacterial type II secretion system protein D signature 

A number of bacterial proteins, some of which are involved in a general secretion pathway (GSP) for the expori of 
55 proteins (also called the type II pathway) [1 to 5], have been found to be evolutionary related. These proteins are listed 
below: - The *D* protein from the GSP operon of: Aeromonas (gene exeD); Erwinia (gene outD); Escherichia coll (gene 
yheF). Klebsiella pneumoniae (gene pulD); Pseudomonas aeruginosa (gene xcpQ); Vibrio cholerae (gene epsD) and 
Xanthomonas campestris (gene xpsD). - comE from Haemophilus Influenzae, Involved in competence (DNA uptake). 
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- pilQ from Pseudomonas aeruginosa, which is essential for the formation of the pili. - hofQ (hopQ) from Escherichia 
coli - hroH from Pseudomonas syringae. which is involved in the secretion of a proteinaceous elicitor of the hypersen- 
sitivity response In plants. - hrpA1 from Xanthomonas campestris pv vesicatoria, wnicn Is also invoived ... ins Hyper- 
sensitivity response. - mxiD from Shigella flexneri which is involved in the secretion of the Ipa invasms which are 

5 necessary for penetration of intestinal epithelial cells. - omc from Neisseria gonorrhoeae. - yssC from Yersinia entero- 
colitlca virulence plasmid pYV, which seems to be required for the export of the Yop virulence proteins. - The gpIV 
protein from filamentous phages such as ft. lice, or m13. GpIV is said to be involved in phage assembly and morpho- 
genesis These proteins all seem to start with a signal sequence and are thought to be Integral proteins in the outer 
membrane. As a signature pattern a conseived region in the C-tennninal section of these proteins f^^s been selerte^ 

10 ^n^Ls pattern: [GR]-[DEQKG]-[STVM]-[LIVMA](3)-[GA]-G-[LIVMFY]-x(11)-[L.VMl-P-ILIVMFYWGS]-tLIVMF]- 
[GSAE]-x-[LIVMl-P-[LIVMFYW](2)-x(2HLV]-F 

[llSalmondG.P.C, Reeves P.J. Trends Biochem.Sci. 18:7-12(1993). „ ,^ „ ^ 

[ 2] Reeves RJ., Whitcombe D., Wharam S., Gibson M., Allison G., Bunce N., Barallon R., Douglas R, Mulholland 
15 V Stevens S.. Walker S., Salmond G.P.C. Mol. Microbiol. 8:443-456(1993). 

[ 3] Martin RR., Hobbs M., Free RD., Jeske Y., Mattick J.S. Mol. Microbiol. 9:857-868(1993). 
[4] Hobbs M., Mattick J.S. Mol. Microbiol. 10:233-243(1993). 
[5] Genin S.. Boucher C.A. Mol. Gen. Genet. 243:112-118(1994). 

20 [0259] 63. (Sac globin) Protozoan/cyanobacterial globins signature 

Globlns are heme-contalning proteins involved in binding and/or transporting oxygen [1]. Almost all globms belong to 
a large family (see < PDOC00793 >), the only exceptions are the following proteins which form a family of their own 
12 31- - Monomeric hemoglobins from the protozoan Paramecium caudatum, Tetrahymena pyrifonnis and Tetrahymena 
thermophila - Cyanoglobin from the cyanobacteria Nostoc commune. - Globlns LI637 and LI410 from the chloroplast 

25 of the alga Chlamydomonas eugametos. - Mycobacterium tuberculosis hypothetical protein MtCY48.23.These proteins 
contain a conseived histidine which could be involved in heme-binding. As a signature pattern, a conserved region 
that ends with this residue was used 

Consensus pattern: F-[LF]-x(5)-G-IPA]-x(4)-G-P<RA]-x-ILIVM]-x(3)-H- 

30 [ 1] Concise Encyclopedia Biochemistry, Second Edition, Walter de Gruyter, Berlin Nev^^■York (1988). 

[ 21 Takagi T Cun-. Opin. Struct. Biol. 3:413-418(1993). .„.„.oo^x 
[ 3] Couture M., Chamberland H., St-Pierre B., Lafontalne J.. Guertin M.: Mol. Gen. Genet. 243:185-197(1994). 

[0260] 64. Band 7 protein family signature , ^ ui ^ 

35 Mammalian band 7 protein [1] (also known as 7.2B or stomatin) is an integral membrane phosphoprotem of red blood 
cells thought to regulate cation conductance by interacting with other proteins of the junctional complex of the mem- 
brane skeleton. Structurally, band 7 is evolutionary related to the following proteins: - Caenorhabditis elegans protein 
mec-2 [2] Mec-2 positively regulates the activity of the putative mechanosensory transduction channel. It may links 
the mechanosensory channel and the microtubule cytoskeleton of the touch receptor neurons. - Caenorhabditis elegans 
40 proteins sto-1 to sto-4. - Caenorhabditis elegans protein unc-1 . - Escherichia coli hypothetical protein ybbK. - Myco- 
bacterium tuberculosis hypothetical protein MtCY277.09. - Synechocystis strain PCC 6803 hypothetical protein sirll 28 
- Methanococcus jannaschii hypothetical protein MJ0827.Structurally all these proteins consist of a short N4ermmal 
domain which is followed by a transmembrane region and a variable size (from 1 70 to 350resldues) C-terminal domain^ 
As a signature pattem. a conserved region located about HOresidues after the transmembrane ^om^n was selected 
45 ConseLs pattem: R-x(2HLIVHSAN]-x(6)-[LIV]-D-x(2)-T-x(2)-W-G-[LIV]- [KRH]-[LTV]-x-[KR]-[LIV]-E-[LIV]-lKR]- 

[ 1] Gallagher PG., Forget B.G. J. Biol. Chem. 270:26358-26363(1995) . 
[ 2] Huang M.. Gu G.. Ferguson E.L., Chalfie M. Nature 378:292-295(1995). 

so [0261] 65. Banwin domain signatures . , j 

Barwin [1] is a barley seed protein of 125 residues that binds weakly a chitlnanalog. It contains six cysteines involved 
in disulfide bonds, as shown in the foltowing schematic representation. 
+ 

xxxxxxxxxxxxxxxCxxxxxxxxxxCxxxxCxCxxxxxxxxCxxxxxxxxxxxxxxxxxxCx||||+— - 

55 ++__ +'C': conserved cysteine involved in a disulfide bond.'": position of the patterns. Banwin 

is closely related to the following proteins: - Hevein, a wound-induced protein found in the latex of rubber trees - HEL. 
an Arabidopsis thaliana hevein-like protein [2]. - Wln1 and win2. two wound-induced proteins from potato. - Paflhogen- 
esis-related protein 4 from tobacco. Hevein and the win1/2 proteins consist of an N4erminal chitin-binding domain 
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followed by a barwin-like C-terminal domain. Barwin and its related proteins could be involved in a defense mechanism 
in plants. As signature patterns, two highly conserved regions that contain some of the cysteines were selected 
Consensus pattem: C-G-IKR]-C-L-x-V-x-N [The two Cs are involved in disulfide bonds]- 
Consensus pattem: V-[DN]-Y-[EQ]-F-V-(DNJ-C [C is involved in a disulfide bond]- 

[ 1] Svensson B., Svendsen I., Hoejrup P.. Roepstorff P. Ludvigsen S., Poulsen F.M. Biochemistry 31:8767-8770 

(1992). 

[ 2] Potter S., Uknes S., Lawton K., Winter A.M.. Chandler D.. Dimaio J.. Novitzky R., Ward E.. Ryals J. Mol. Plant 
Microbe Interact. 6:680-685(1993). 

[0262] 66. (Bowman-Birk leg) Bowman-Birk serine protease inhibitors family signature 

PROSITE cross-reference(s). The Bowman-Birk inhibitor family [1] is one of the numerous families of serine proteinase 
Inhibitors. As it can be seen in the schematic representation, they have a duplicated structure and generally possess 
two distinct inhibitory sites: 



+ + 

I + + + + + I 

I I I I I I I I 
xxCCxxCxxCxx#xxCxxCxxxxCxxxCxxxCxxxxCxx#xxCxxCxxCxxCxx 

Mil II 

+ + + I 

< 70 residues > 

30 'C: conserved cysteine involved in a disulfide bond. 

'#': active site residue, 
position of the pattem. 



35 [0263] These inhibitors are found in the seeds of all leguminous plants as well as in cereal grains. In cereals they 
exist in two forms, one of which is a duplication of the basic structure shown above [2]. The pattern that was developed 
to pick up sequences belonging to this family of inhibitors is in the central part of the domain and includes four cysteines. 
[0264] Consensus pattem C-x(5,6)-[DENQKRHSTA]-C-[PASTDH]-[PASTDK]-[ASTDV]-C-[NDKS]-[DEKRHSTA]-C 
[The four C's are involved in disulfide bonds] Note this pattem can be found twice in some duplicated cereal inhibitors. 

40 

[ 1] Laskowski M., Kato I. Annu. Rev. Biochem. 49:593-626(1980). 

[ 2] Tashiro M.. Hashino K.. Shiozaki M.. Ibuki R. Maki Z. J. Biochem. 102:297-306(1987). 



[0265] 67. Pathogenesis-related protein Bet v I family signature 

[0266] A number of plant proteins, which all seem to be involved in pathogen defense response, are structurally 
related [1,2,3]. These proteins are: 



Bet V I, the major pollen allergen from white birch. Bet v I is the main cause of type I allergic reactions in Europe, 

North America and USSR. 
50 . AIn g I, the major pollen allergen from alder. 

Api G I, the major allergen from celery. 

Car b I, the major pollen allergen from hornbeam. 

Cor a I, the major pollen allergen from hazel. 

Mai d I, the major pollen allergen from apple. 
55 - Asparagus wound-induced protein AoPRI . 

Kidney bean pathogenesis-related proteins 1 and 2. 

Parsley pathogenesis-related proteins PR1-1 and PR1-3. 

Pea disease resistance response proteins pi49, pi 176 and DRRG49-C. 
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Pea abscisic acid-responsive proteins ABR17 and ABR18. 
Potato pathogenesls-related proteins STH-2 and STH-21 . 
Soybean stress-induced protein SAM22. 

[0267] These proteins are thought to be intracellularly located. They contain from 1 55 to 1 60 amino acid residues. 
As a signature pattern, a conserved region located in the third quarter of these proteins has been selected 
Consensus pattern: G-x(2)-[LIVMF]-x(4)-E-x(2)-[CSTAEN]-x(8,9)-[GND]-G-[GS]- [CS3-x(2)-K-x(4)-[FY]- 

[1] Breiteneder H., Pettenburger K., Bito A., Valenta R.. Kraft D., Rumpold H.. Scheiner O., Breitenbach M. EMBO 
J. 8:1935-1938(1989). 

[2] Crowell D., John M.E., Russell D„ Amasino R.M. Plant Mol. Biol. 18:459-466(1992). 
[3] Warner S.A.J., Scott R.. Draper J. Plant Mol. Biol. 19:555-561(1992). 

[0268] 68. bZIP transcription factors basic domain signature 

The bZIP superfamily [1,2,] of eukaryotic DNA-binding transcription factors groups together proteins that contain a 
basic region mediating sequence-specific DNA-binding followed by a leucine zipper required for dimerization. This 
family is quite large, therefore only a parital list of some representative members appears here. - Transcnption factor 
AP-1 which binds selectively to enhancer elements in the cis control regions of SV40 and metallothionein IIA. AP-1 , 
also known as c-jun. is the cellular homolog of the avian sarcoma virus 17 (ASV17) oncogene v-jun. - Jun-B and jun- 
D probable transcription factors which are highly similar to jun/AP-1 The fos protein, a proto-oncogene that fonris a 
non-covalent dimer with c-jun. - The fos-related proteins fra-1 , and fos B. - Mammalian cAMP response element (CRE) 
binding proteins CREB, CREM, ATF-1, ATF-3. ATF-4, ATF-5. ATF-6 and LRF-1. - Maize Opaque 2, a trans-acting 
transcriptional activator involved in the regulation of the production of zein proteins during endosperm. - Arabidopsis 
G-box binding factors GBF1 to GBF4. Parsley CPRF-1 to CPRF-3. Tobacco TAF-1 and wheat EMBP-1. All these 
proteins bind the G-box promoter elements of many plant genes. - Drosophila protein Giant, which represses the 
expression of both the kruppel and knirps segmentation gap genes. - Drosophila Box B binding factor 2 (BBF-2), a 
transcriptional activator that binds to fat body-specific enhancers of alcohol dehydrogenase and yolk protein genes. - 
Drosophila segmentation protein cap'n'collar (gene cnc). which is involved in head morphogenesis. - Caenorhabditis 
elegans skn-1 , a developmental protein involved in the fate of ventral blastomeres in the earty embryo. - Yeast GCN4 
transcription factor, a component of the general control system that regulates the expression of amino acid-synthesizing 
enzymes in response to amino acid starvation, and the related Neurospora crassa cpc-1 protein. - Neurospora crassa 
cys-3 which turns on the expression of structural genes which encode sulfur^atabolic enzymes. - Yeast MET28, a 
transcriptional activator of sulfur amino acids metabolism. - Yeast PDR4 (or YAP1), a transcriptional activator of the 
genes for some oxygen detoxification enzymes. - Epstein-Ban- virus trans-activator protein BZLF1.- 
Consensus pattem: [KR]-x(1 ,3)-[RKSAQ]-N-x(2)-[SAQl(2)-x-[RKTAENQ]-x-R-x-[RK]- 
[ 1] Hurst H.C. Protein Prof. 2:105-168(1995).! 2] EllenbergerT Curr. Opin. Struct. Biol. 4:12-21(1994). 
[0269] 69. Biotin-requiring enzymes attachment site 

Biotin, which plays a catalytic role in some carboxyl transfer reactions, is covalently attached, via an amide bond, to a 
lysine residue in enzymes requiring this coenzyme [1,2,3.4]. Such enzymes are: 

Pyruvate carboxylase (EC 6.4.1.1). 
Acetyl-CoA carboxylase (EC 6.4.1.2). 
Propionyl-CoA carboxylase (EC 6.4.1.3). 
Methylcrotonoyl-CoA caritioxylase (EC 6.4.1.4). 
Geranoyl-CoA carboxylase (EC 6.4.1.5). 
Urea carboxylase (EC 6.3.4.6). 
Oxaloacetate decarboxylase (EC 4.1.1.3). 
Methylmalonyl-CoA decarboxylase (EC 4.1 .1 .41). 
Glutaconyl-CoA decarboxylase (EC 4.1 .1 .70). 

Methylmalonyl-CoA carboxyl-transferase (EC 2.1.3.1) (transcarboxylase). Sequence data reveal that the region 
around the biocytin (biotin-lysine) residue is well conserved and can be used as a signature pattem. 

[0270] Consensus pattem[GN]-[DEQTR].x-[LIVMFY]-x(2)-[LIVMl-x-[AlV]-M.K-[LMAT|-x(3)-[LIVM]-x-[SAV] [K is the 
biotin attachment site] Note the domain around the biotin-binding lysine residue is evolutionary related to that around 
the lipoyl-binding lysine residue of 2-oxo acid dehydrogenase acyltransferases 

[ 1] Knowles J.R. Annu. Rev. Biochem. 58:195-221(1989). 

[2] Samols D.. Thronton C.G.. Murtif V.L.. Kumar G.K.. Haase F.C.. Wood H.G. J. Biol. Chem. 263:6461-6464 
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(1988). 

( 3] Goss N.H.. Wood H.G. Meth. Enzymol. 107:261-278(1984). 

1 4] Shenoy B.C., Xie Y.. Park V.L.. Kumar G.K.. Beegen H.. Wood H.G.. Samols D. J. Biol. Chem. 267:18407-18412 
(1992). 

5 

[0271] 2-oxo acid dehydrogenases acyltransferase component lipoyt binding site 

The 2-oxo acid dehydrogenase multienzyme complexes [1 ,2] from bacterial and eukaryotic sources catalyze the oxi- 
dative decarboxylation of 2-oxo acids to the corresponding acyl-CoA. The three members of this family of multienzyme 
complexes are: 

10 

Pyruvate dehydrogenase complex (PDC). 
2-oxoglutarate dehydrogenase complex (OGDC). 
Branched-chain 2-oxo acid dehydrogenase complex (BCOADC). 

15 These three complexes share a common architecture: they are composed of multiple copies of three component en- 
zymes - E1, E2 and E3. El is a thiamine pyrophosphate-dependent 2-oxo acid dehydrogenase, E2 a dihydrolipamide 
acyltransferase, and E3 an FAD-contatning dihydrolipamide dehydrogenase. 

[0272] E2 acyltransferases have an essential cofactor, lipoic acid, which Is covalently bound via a amide linkage to 
a lysine group. The E2 components of OGCD and BCOACD bind a single lipoyi group, while those of PDC bind either 
20 one (in yeast and in Bacillus), two (in mammals), or three (in Azotobacter and in Escherichia coil) lipoyl groups [3]. 
In addition to the E2 components of the three enzymatic complexes described above, a lipoic acid cofactor is also 
found in the following proteins: 

H-protein of the glycine cleavage system (GCS) [4]. GCS is a multienzyme complex of four protein components, 
25 which catalyzes the degradation of glycine. H protein shuttles the methylamine group of glycine from the P protein 

to the T protein. H-protein from either prokaryotes or eukaryotes binds a single lipoic group. 

Mammalian and yeast pyruvate dehydrogenase complexes differ from that of other sources, in that they contain, 

in small amounts, a protein of unknown function • designated protein X or component X. Its sequence is closely 

related to that of E2 subunits and seems to bind a lipoic group [5]. 
30 - Fast migrating protein (FMP) (gene acoC) from Alcaligenes eutrophus [6]. 

This protein is most probably a dihydrolipamide acyltransferase involved in acetoin metabolism. 

A signature pattem was developed which allows the detection of the lipoyl-binding site. 

[0273] Consensus pattem[GN]-x(2)-[LIVF]-x(5)-[LIVFC]-x(2)-[LIVFA]-x(3)-K-[STAIV]-[STAVQDN]-x(2)-[LIVMFS]-x 
35 (5)-[GCN]-x-[LIVMFY] [K is the lipoyl-binding site] Note the domain around the lipoyl-binding lysine residue is evolu- 
tionary related to that around the biotin-binding lysine residue of biotin requiring enzymes 

[ 1) Yeaman S.J. Biochem. J. 257:625-632(1989). 
[ 2] Yeaman S.J. Trends Biochem. Sci. 11:293-296(1986). 
40 I 3] Russel G.C.. Guest J.R. Biochim. Biophys. Acta 1076:225-232(1991). 

[4] Fujiwara K.. Okamura-lkeda K., Motokawa Y. J. Biol. Chem. 261:8836-8841(1986). 

[ 5] Behal R.H.. Browning K.S., Hall TB.. Reed L.J. Proc. Natl. Acad. Sci. U.S.A. 86:8732-8736(1989). 

[ 6] Priefert H., Hein S.. Krueger N., Zeh K.. Schmidt B.. Steinbuechel A. J. Bacteriol. 173:4056-4071(1991). 

<5 [0274] 70. C2 (C2 domain) Number of members: 295 

Some isozymes of protein kinase C (PKC) [1,2] contain a domain, known as C2, of about 116 amino-acid residues 
which is located between the two copies of the CI domain (that bind phorbol esters and diacylglycerol) (see 
<PDOC00379>) and the protein kinase catalytic domain (see <POOC00100>). Regions with significant homology 
[3.E1] to the C2-domain have been found in the following proteins: 

50 

PKC isoforms alpha, beta and gamma and Drosophila isoforms PKC1 and PKC2. 

PKC isoforms delta, epsilon and eta, Caenorhabditis elegans kin-13 and yeast PKC1 have a C2-like domain at 
the N-terminal extremity [4). 

Yeast cAMP dependent protein kinase SCH9 contains a C2-like domain. 
55 . Mammalian phosphatidylinositol-specific phospholipase C (PI-PLC) (see <PDOC50007>) isoforms beta, gamma 
and delta as well as several non-mammalian PI-PLCs have a C2-like domain C-tenminal of the catalytic domain. 
Mammalian and plants phosphatidylinositol-3-kinase have a C2-like domain in the central region of the 110 Kd 
catalytic subunit. 



53 



EP 1 033 405 A2 



- Yeast phosphatidylserine-decarboxylase 2 (gene PSD2) contains a C2 domain in its central region. 

- Cvtosolic phospholipase D from plants and cytosolic phospholipase A2 have a C2-like domain at their N-terminus. 

- Synaptotagmins (p65). This is a family of related synaptic vesicle proteins that bind acidic phospholipids ana that 
may have a regulatory role in the membrane interactions during trafficking of synaptic vesicles at the active zone 

5 of the synapse. All isofomns of synaptotagmins have two copies of the C2 domain in their C4erminal region. 

Rabphilin-3A, a synaptic protein contains two C2 domains. 

- Caenorhabditis elegans protein unc-13 whose function is not -known. Unc-13 has a C2 domain in its central part 
and a C2-like domain at the C-terminus. 

rasGAP and the breakpoint cluster protein bcr have a C2-domain C-terminal of a PH-domain. 
10 - Yeast protein BUD2 (or CI_A2) has a C2-domain in the central region. 

- Yeast protein RSP5 and human protein NEDD-4, both proteins also contain WW domains (see <PDOC50020>). 

- Perforin (see <PDOC00251>) has a C2 domain at the C-terminus. It is the only extracellular protein known to 
contain a C2 domain. 

Yeast hypothetical protein YML072C has a C2 domain. 
15 - Yeast hypothetical protein YNL087W has three C2 domains. 

Caenorhabditis elegans hypothetical protein F37A4.7 has two C2 domains. 

The C2 domain is thought to be involved in calcium-dependent phospholipid binding [5]. Since domains related to 
the C2 domain are also found in proteins that do not bind calcium, other putative functions for the C2 domain like 
e.g. binding to inositol-1.3.4,5-tetraphosphate have been suggested [6]. Recently, the 3D structure of the first C2 
20 domain of synaptotagmin has been reported [7], the domain forms an eight-stranded beta sandwich. The signature 

pattern that has been developed for the C2 domain is located in a conserved part of that domain, the connecting 
loop between beta strands 2 and 3. A profile has been developed for the C2 domain that covers the total domain. 

- Consensus pattern: [ACG]-x(2)-L-x(2,3)-D-x(1 ,2)-[NGSTLIF]-[GTMR]-x-[STAP]-D-[PA]-[FY] 

25 - Note: this documentation entry is linked to both a signature pattern and a profile. As the profile is much more 
sensitive than the pattern, you should use it if you have access to the necessary software tools to do so. 

[0275] [1]Medline; 96367095 Extending the C2 domain family: C2s in PKCs delta, epsllon, eta and theta. phosphol- 
ipases, GAPs and perforin. Ponting CP, Parker PJ; Protein Sci 1996;5:162-166. 

30 

{ 1] Azzi A., Boscoboinik D., Hensey C. Eur. J. Biochem. 208:547-557(1992). 
[2] Stabel S. Semin. Cancer Biol. 5:277-284(1994). 

t 3] Brose N., Hofmann K.O., Hata Y., Suedhof TC. J. Biol. Chem. 270:25273.25280(1995). 
[4] Sossin W.S.. Schwartz J.H. Trends Biochem. Sci. 18:207-208(1993). 
35 [ 5] Davletov B.A., Suedhof TC. J. Biol. Chem. 268:26386-26390(1993). 

[ 6] Fukuda M., Aruga J., Niinobe M.. Aimoto S.. Mikoshiba K. J. Biol. Chem. 269:29206-29211(1994). 
[ 6] Sutton R.B., Davletov B.A.. Berghuis A.M.. Suedhof TC. Sprang S.R. Cell 80:929-938(1995). 

[0276] 71 . CAP (CAP protein) Number of members: 11 
40 In budding and fission yeasts the CAP protein is a bifunctional protein whose N-temninal domain binds to adenylyl 
cyclase, thereby enabling that enzyme to be activated by upstream regulatory signals, such as Ras. The function of 
the C-terminal domain is less clear, but it is required for normal cellular morphology and growth control [1]. CAP is 
conserved in higher eukaryotic organisms where its function is not yet clear [2]. 

Structurally, CAP is a protein of 474 to 551 residues which consist of two domains separated by a proline-rich hinge. 
45 Two signature patterns, one conresponding to a conserved region in the N-terminal extremity and the other to a C- 
termina! region have been developed. 

. Consensus pattern: [LIVM](2)-x-R-L-[DE]-x(4)-R-L-E 

- Consensus pattem: D-[L1VMFY]-x-E-x-[PA]-x-P-E-Q-[LIVMFYl-K 

50 

[ 1] Kawamukai M.. Gerst J., Field J., Riggs M.. Rodgers L.. Wigler M.. Young D. Mol. Biol. Cell 3:167-180(1992). 
1 2] Yu G.. Swiston J., Young D. J. Cell Sci. 107:1671-1678(1994). 

[0277] 72. CAP_GLY (CAP-Gly domain) 
55 CAP stands for cytoskeleton-associated proteins. Swiss:P39937 may be a member but has not been included. It has 
a weak match to the family between residues 22-67. Number of members: 24 

[1]Medline: 93242656. Sequence homologies between four cytoskeleton-associated proteins. Riehemann K, Sorg C; 
Trends Biochem Sci 1993;18:82-83. 
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[0278] It has been shown [1] that some cytoskeleton-associated proteins (CAP) share the presence of a conserved, 
glycine-rich domain of about 42 residues, called here CAP-Gly. Proteins known to contain this domain are listed below. 

Restin (also known as cytoplasmic linker protein-170 or CLIP-170), a 160 Kd protein associated with intermediate 
5 filaments and that links endocyttc vesicles to microtubules. Restin contains two copies of the CAP^Iy domain. 

Vertebrate dynactin (150 Kd dynein-associated polypeptide; DAP) and Drosophila glued, a major component of 
activator t, a 20S polypeptide complex that stimulates dynein^mediated vesicle transport. 

Yeast protein BIK1 which seems to be required for the formation or stabilization of microtubules during mitosis and 
for spindle pole body fusion during conjugation. 
10 - Yeast protein NIP100 (NIP80). 

Human protein CKAP1/TFCB, Schtzosaccharomyces pombe protein alp11 and Caenorhabditis elegans hypothet- 
ical protein F53F4.3. These proteins contain a N-tenminal ubiquitin domain (see <PDOC00271>) and a C-terminal 
CAP-Gly domain. 

Caenorhabditis elegans hypothetical protein M01A8.2. 
15 . Yeast hypothetical protein YNL148c. 

Structurally, these proteins are made of three distinct parts: an N-terminal section that is most probably globular and 
contains the CAP-Gly domain, a large central region predicted to be in an alpha-helical coiled-coil conformation and, 
finally, a short C-terminal globular domain. The signature for the CAP-Gly domain corresponds to the first 32 residues 
20 of the domain and includes five of the six conserved glycines. 

- Consensus pattem: G-x(8, 1 0)-[FYW]-x-G-[LIVM]-x-[LIVMFY]-x(4)-G-K-[NHl-x-G-[STAR]-x(2)-G-x(2).[LY]-F 

[ 1) Riehemann K., Sorg C. Trends Biochem. Sci. 18:82-83(1993). 
25 [0279] 73. (CBD 1) 

Cellutose-binding domain, fungal type 

The microbial degradation of cellulose and xylans requires several types of enzymes such as endoglucanases (EC 
3.2.1.4), cellobiohydrolases (EC 3.2.1.91) (exoglucanases), or xylanases (EC 3.2.1.8) [1]. 

[0280] Structurally, cellulases and xylanases generally consist of a catalytic domain joined to a cellulose-binding 
30 domain (CBD) by a short linker sequence rich in proline and/or hydroxy-amino acids. 

[0281] The CBD of a number of fungal cellulases has been shown to consist of 36 amino acid residues. Enzymes 
known to contain such a domain are: 

Endoglucanase I (gene egl1) from Trichoderma reesei. 
35 - Endoglucanase II (gene egl2) from Trichoderma reesei. 

Endoglucanase V (gene egI5) from Trichoderma reesei. 

Exocellobiohydrolase I (gene CBHI) from Humicola grisea, Neurospora crassa. Phanerochaete chrysosporiurr 
Trichoderma ree<ei, and Trichoderma viride. 
Exocellobiohydrolase II (gene CBHIl) from Trichoderma reesei. 
40 . Exocellobiohydrolase 3 (gene cel3) from Agaricus bisporus 
Endoglucanases B, C2, F and K from Fusarium oxysporum. 

[0282] The CBD domain is found either at the N-terminal (Cbh-ll or egl2) or at the C-terminal extremity (Cbh-1, egll 
or eg 15) of these enzymes. As it is shown in the following schematic representation, there are four conserved cysteines 
45 in this type of CBD domain, all involved in disulfide bonds. 

+ + 



xxxxxxxCxxxxxxxxxxCxxxxxCxxxxxxxxxCx 

*C: conserved cysteine involved in a disulfide bond, 
position of the pattem. 

[0283] Such a domain has also been found in a putative polysaccharide binding protein from the red alga. Porphyra 
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purpurea [2] Structurally, this protein consists of four tandem repeats of the CBD domain. 

[02841 Consensus patternC-G-G-x(4.7)-G-x(3)-C-x{5)-C-x(3.5)-[NHG]-x-[FYWM]- x(2)-Q-C [The four C's are in- 
volved in disulfide bonds] Sequences known to belong to this class detected by the pattern ALL. 

5 [ 1] Gilkes N.R., Henrissat B., Kilbum D.G.. Miller R.C. Jr., Warren R.A.J. Microbiol. Rev. 55:303-315(1991). 

[ 2] Liu Q., der Meer J.P.. Reith M.E. 

[0285] 74. CBS domain. 3D Structure found as a subdomain in TIM barrel of inosine-. CBS domain web page. CBS 
domains are small intracellular modules mostly found in 2 or four copies within a protein. CBS domains are found in 
10 cystathionine-beta-synthase (CBS) where mutations lead to homocystinuria. Two CBS domains are found in inosine- 
monophosphate dehydrogenase from all species, however the CBS domains are not needed for activity. Two CBS 
domains are found in intracellular loops of several chloride channels. Mutations in this domain of Swiss:P35520 lead 
to homocystinuria. Number of members: 414 

15 [1]Medline: 97172695 The structure of a domain common to archaebacteria and the homocystinuria disease pro- 

tein. Bateman A; Trends Biochem Sci 1997;22:12-13. 

[2]Medline: 96279836 Structure and mechanism of inosine monophosphate dehydrogenase in complex with the 
immunosuppressant mycophenolic-acid. Sintchak MD. Fleming MA, Futer O, Raybuck SA, Chambers SP, Caron 
PR. Murcko MA, Wilson KP; Cell 1996;85:921-930. 
20 Discovery of CBS domain. 

[3]Medline: 97259972 CBS domains in CIC chloride channels implicated in myotonia and nephrolithiasis (kidney 
stones). Ponting CP; J Mol Med 1997;75:160-163. 

[0286] 75. CDP-OH_P_transf (CDP-alcohol phosphatidyltransferase) 
25 All of these members have the ability to catalyze the displacement of CMP from a CDP-alcohol by a second alcohol 
with formation of a phosphodiester bond and concomitant breaking of a phosphoride anhydride bond. Number of mem- 
bers: 32 

A number of phosphatidyltransferases. which are all involved in phospholipid biosynthesis and that share the property 
of catalyzing the displacement of CMP from a CDP-alcohol by a second alcohol with fonmation of a phosphodiester 
30 bond and concomitant breaking of a phosphoride anhydride bond share a conserved sequence region [1,2]. These 
enzymes are: 

- Ethanolaminephosphotransferase (EC 2.7.8.1) from yeast (gene EPT1). 

- Diacylglycerol cholinephosphotransferase (EC 2.7.8.2) from yeast (gene CPT1). 

35 - Phosphatidylglycerophosphate synthase (EC 2.7.8.5) (CDP-diacylglycerol-glycerol-3-phosphate 3-phosphatidyl- 
transferase) from bacteria (gene pgsA). 

- Phosphatidylserine synthase (EC 2.7.8.8) (CDP-diacylglycerol-serine O-phosphatidyltransferase) firom yeast 
(gene CH01) and from Bacillus subtilis (gene pssA). 

- Phosphatidylinositol synthase (EC 2.7.8.11) (CDP-diacylglycerol-inositol 3-phosphatidyltransferase) from yeast 
40 (gene PIS). 

These enzymes are proteins of from 200 to 400 amino acid residues. The conserved region contains three aspartic 
acid residues and is located in the N-terminal section of the sequences. 

45 - Consensus pattern: D-G-x(2)-A-R-x(8)-G-x(3)-D-x(3)-D 

[IjMedline: 97075020 Two-dimensional 1H-NMR of transmembrane peptides from Escherichia coli phosphatidylglyc- 
erophosphate synthase in micelles. Morein S. Trouard TP. Hauksson JB, Rilfors L, Arvidson G. Lindblom G; Eur J 
Biochem 1996;241:489-497. 

50 

1 1] Nikawa J.-l., Kodaki T. Yamashita S. 
J. Biol. Chem. 262:4876-4881(1987). 
[ 2] Hjelmstad R.H., Bell R.M. 
J. Biol. Chem. 266:5094-5134(1991). 

55 

[0287] 76. CHOD (Cholesterol oxidase) Members of the GMC oxidoreductase family Number of members: 3 
[0288] [1]Medline: 94032271. Crystal stmcture of cholesterol oxidase complexed with a steroid substrate: implica- 
tions for flavin adenine dinucleotide dependent alcohol oxidases. Li J. VHelink A. Brick P. Blow DM; Biochemistry 1 993; 
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32:11507-11515. 

[0289] The following FAD flavoproteins oxidoreductases have been found [1.2] to be evolutionary related. These 
enzymes, which are called *GMC oxidoreductases', are listed below. 

5 - Glucose oxidase (EC 1.1.3.4) (GOX) from Aspergillus niger. Reaction catalyzed: glucose + oxygen -> delta-lu- 
conolactone + hydrogen peroxide. 

Methanol oxidase (EC 1.1.3.13) (MOX) from fungi. Reaction catalyzed: methanol + oxygen -> acetaldehyde + 
hydrogen peroxide. 

Choline dehydrogenase (EC 1.1.99.1) (CHD) from bacteria. Reaction catalyzed: choline unknown acceptor -> 
10 betaine acetaldehyde + reduced acceptor. 

Glucose dehydrogenase (GLD) (EC 1.1 .99.1 0) from Drosophila. Reaction catalyzed: glucose + unknown acceptor 
-> delta-gluconolactone + reduced acceptor. 

Cholesterol oxidase (CHOD) (EC 1 .1 .3.6) from Brevibacterium sterolicum and Streptomyces strain SA-COO. Re- 
action catalyzed: cholesterol + oxygen -> cholest-4-en-3-one + hydrogen peroxide. 
IS - AlkJ [3], an alcohol dehydrogenase from Pseudomonas oleovorans, which converts aliphatic medium-chain-length 
alcohols into aldehydes. This family also includes a lyase: 

(R)-mandelonitrile lyase (EC 4.1.2.10) (hydroxynitrile lyase) from plants [4], an enzyme involved in cyanogenis, 
the release of hydrogen cyanide from injured tissues. These enzymes are proteins of size ranging from 556 (CHD) 
to 664 (MOX) amino acid residues which share a number of regions of sequence similarities. One of these regions, 
20 located in the N-terminal section, corresponds to the FAD ADP- binding domain. The function of the other conserved 

domains is not yet known; two of these domains have been selected as signature pattems. The first one is located 
in the N-terminal section of these enzymes, about 50 residues after the ADP-binding domain, while the second 
one is located in the central section. 

25 - Consensus pattem: [GA}-[RKN]-x-ILIV]-G(2)-[GST|(2)-x-[LIVM]-N-x(3)-[FYWA]-x(2)-[PAG]-x(5)-[DNESH] 

- Consensus pattem: [GS]-[PSTA)-x(2)-lST|-P-x-(LIVM](2)-x(2)-S-G-[LIVM]-G 

[ 1] Cavener D.R. J. Mol. Biol. 223:811-814(1992). 
[2] Henikoff S.. Henikoff J.G. Genomics 19:97-107(1994). 
30 [ 3] van Beilen J.B.. Eggink G.. Enequist H.. Bos R.. Witholt B. Mol. Microbiol. 6:3121-3136(1992). 

[4] Cheng I.R, Poulton J.E. Plant Cell Physiol. 34:1139-1143(1993). 

[0290] 77. CKS (Cyclin-dependent kinase regulatory subunit) Number of members: 11. Cyclin-dependent kinases 
(CDK) are protein kinases which associate with cyclins to regulate eukaryotic cell cycle progression. The most well 

35 known CDK is p34-cdc2 (CDC28 in yeast) which is required for entry into S-phase and mitosis. CDK's bind to a regu- 
latory subunit which is essential for their biological function. This regulatory subunit is a smalt protein of 79 to 150 
residues. In yeast (gene CKS1) and in fission yeast (gene suci) a single isoform is known, while mammals have two 
highly related isoforms. It has been shown [1] that these CDK regulatory subunits assemble as an hexamer which then 
acts as a hub for the oligomerization of six CDK catalytic subunits. The sequence of CDK regulatory subunits are highly 

40 conserved therefore, the two most conserved regions have been used as signature pattems. 

- Consensus pattem: Y-S-x-[KR]-Y-x-[DE](2)-x-[FY]-E-Y-R-H-V-x-[LV]-[PTl-[KRP] 

- Consensus pattem: H-x-P-E-x-H-[IV]-L-L-F-[KR] 

45 [0291] [ 1] Parge H.E.. Arvai A.S., Murtari D.J., Reed S.I., Tainer J.A. Science 262:387-395(1993). 
[0292] 78. CK_ILbeta (Casein kinase II regulatory subunit) 

Number of members: 16. Casein kinase II (CK-2) [1] is an ubiquitous eukaryotic serine/threonine protein kinase which 
is found both in the cytoplasm and the nucleus and whose substrates are numerous. It generally phosphorylates Ser 
or Thr at the N-terminai of stretch of acidic residues (see <PDOC00006>). CK-2 exists as an heterotetramer composed 
50 of two catalytic subunits (alpha) and two regulatory subunits (beta). In most species there are two closely related 
isoforms of the catalytic subunit: alpha and alpha'. Some species, such as fungi and plants, express two forms of 
regulatory subunits: beta and beta'. The exact function of the regulatory subunit is not yet known. It is a highly conserved 
protein of about 25 Kd that contains, in its central section, a cysteine-rich motif that could be involved in binding a metal 
such as zinc [2]. This region has been used as a signature pattem. 

55 

- Consensus pattem: C-P-x-[LIVMY]-x-C-x(5)-[Lll-P-[LIVMC]-G-x(9)-V-[KR]-x(2)-C-P-x-C 
[ 1] Allende J.E., Allende C.C. FASEB J. 9:313-323(1995). 
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[ 2] Reed J.C.. Bidwai A.P.. Glover C.V.C. J. Biol. Chem. 269:18192-18200(1994). 
[0293] 79. CLP_protease (CIp protease) 

These proteins belong to family S14 in the classification of peptidases. 

- !- The Clp protease has an active site catalytic triad. In E. coli CIp protease, ser-lll, his-136 and asp-185 forni 
the catalytic triad. 

!- Swiss:P48254 has lost all of these active site residues and is therefore inactive. 

- I- Swiss:P42379 contains two large insertions, Swiss:P42380 contains one large insertion. Number of members: 38 

[0294] The endopeptidase CIp (EC 3.4.2 1 .92) from Escherichia coli cleaves peptides in various proteins in a process 
that requires ATP hydrolysis [1 ,2]. Clp is a dimeric protein which consists of a proteolytic subunit (gene cIpP) and either 
of two related ATP-binding regulatory subunits (genes clpA and cIpX). CIpP is a serine protease which has a chymo- 
trypsin-like activity. Its catalytic activity seems to be provided by a charge relay system similar to that of the trypsin 
family of serine proteases, but which evolved by independent convergent evolution. Proteases highly similar to CIpP 
have been found to be encoded in the genome of the chloroplast of plants and seem to be also present in other 
eukaryotes. The sequences around two of the residues involved in the catalytic triad (a serine and a histidine) are 
highly conserved and can be used as signature patterns specific to that category of proteases. 

- Consensus pattern: T-x(2)-[LIVMFl-G-x-A-[SAC]-S-[MSAHPAG]-[STA] [S is the active site residue] 

- Consensus pattern: R-x(3)-[EAP]-x(3)-[LIVMFYT]-M-[LIVM]-H-Q-P [H is the active site residue] 

[1]Medline: 98050920, The structure of ClpP at 2.3 angstroms resolution suggests a model for ATP-dependent 
proteolysis. Wang J, Hartling JA, Flanagan JM; Cell 1997;91:447-456. 

[ 1] Maurizi M.R.. Clark W.P. Kim S.-H., Gottesman S. J. Biol. Chem. 265:12546-12552(1990). 
[2] Gottesman S.. Maurizi M.R. Microbiol. Rev. 56:592-621(1992). 
[3] Rawlings N.D.. Ban-ett A.J. Meth. Enzymol. 244:19-61(1994). 

[0295] 80. CNG_membrane (Transmembrane region cyclic Nucleotide Gated Channel) 

[1]Medline: 94224763. Cyclic nucleotide-gated channels: an expanding new family of ion channels. Yau KW; Proc Natl 
Acad Sci USA 1994;91:3481-3483. 

This family is found to the N-terminus of the cNMP.binding. Number of members: 56. Proteins that bind cyclic nucle- 
otides (CAMP or cGMP) share a structural domain of about 120 residues [1-3]. The best studied of these proteins is 
the prokaryotic catabolite gene activator (also known as the cAMP receptor protein) (gene crp) where such a domain 
is known to be composed of three alpha-helices and a distinctive eight-stranded, antiparallel beta-barrel structure. 
Such a domain is known to exist in the following proteins: 

Prokaryotic catabolite gene activator protein (CAP). 
. cAMP- and cGMP-dependent protein kinases (cAPK and cGPK). Both types of kinases contains two tandem copies 
of the cyclic nucleotide-binding domain. The cAPK's are composed of two different subunits: a catalytic chain and 
a regulatory chain which contains both copies of the domain. The cGPK's are single chain enzymes that include 
the two copies of the domain in their N-temninal section. The nucleotide specificity of cAPK and cGPK is due to 
an amino acid in the conserved region of beta-ban-el 7: a threonine that is invariant in cGPK is an alanine in most 
cAPK. . ^ ^ 

- Vertebrate cyclic nucleotide-gated ion-channels. Two such cations channels have been fully characterized. One 
is found in rod cells where it plays a role in visual signal transduction. It specifically binds to cGMP leading to an 
opening of the channel and thereby causing a depolarization of rod photoreceptors. In olfactory epithelium a similar, 
cAMP-binding, channel plays a role in odorant signal transduction. There are six invariant amino acids in this 
domain, three of which are glycine residues that are thought to be essential for maintenance of the structural 
integrity of the beta-ban-el. Two signature patterns have been developed for this domain. The first pattern is located 
within beta-ban-els and 3 and contains the first two conserved Gly The second pattern is located within beta- 
ban-els 6 and 7 and contains the third conserved Gly as well as the three other invariant residues. 

- Consensus pattern: [LIVM]-[VIC]-x(2)-G-[DENQTA]-x-[GAC]-x(2)-tLIVMFY](4)-x(2)-G 

- Consensus pattern: [LIVMF]-G-E-x-[GAS]-[LIVM]-x(5.11)-R-[STAQ]-A-x-[LIVMA]-x-[STACV] 

[ 1] Weber I.T.. Shabb J.B., Cort>in J.D. Biochemistry 28:6122-6127(1989). 
[ 2] Kaupp U.B. Trends Neurosci. 14:150-157(1991). 
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[ 3] Shabb J.B.. Corbin J.D. J. Biol. Chem. 267:5723-5726(1 992). 

[0296] 81 . COX10.ctaB.cyoE (Cytochrome c oxidase assembly factor) 
(1] 

5 Medline: 95191390 

Biosynthesis and functional role of haem O and haem A 
Mogi T, SaikI K, Anraku Y; 

Mo! Microbiol 1994;14:391-398. 
Cytochrome c oxidase is a multi subunit enzyme. The complexity of this enzyme requires assistance in building the 
10 complex. 

This is carried out by the Cytochrome c oxidase assembly factor. 
Number of members: 31 

[0297] Cytochrome c oxidase is an oligomeric enzymatic complex which seems to require the aid of a number of 
proteins that either act as chaperonins to help the subunits of the enzyme to fold correctly, or assist in the assembly 
15 of the metal centers [1]. One of these subunits is known as COX10 In yeast and as ctaB [2] in aerobic prokaryotes. It 
is evolutionary related to cyoE protein from the Escherichia coli cytochrome O terminal oxidase complex. 
[0298] These proteins probably contain [3] seven transmembrane segments. The most conserved region is located 
in a loop between the second and third of these segments and has been selected as a signature pattern. 

20 - Consensus pattem: [ED]-x-D-x(2)-M-x-R-T-x(2)-R-x(4)-G 

[ 1] Nobrega M.P., Nobrega F.G., Tzagoloff A. 

J. Biol. Chem. 265:14220-14226(1990). 
[ 2] Cao J., Hosier J., Shapleigh J., Revzin A., Ferguson-Miller S. 
25 J. Biol. Chem. 267:24273-24278(1992). 

[ 3] Chepuri V., Gennis R.B. 

J. Biol. Chem. 265:12978-12986(1990). 

[0299] 82. COX3 (Cytochrome c oxidase subunit III) 
30 This family corresponds to chains c and p. 

[1] 

Medline: 96216288 
The whole structure of the 13-subunit oxidized cytochrome c 
oxidase at 2.8 A. 

35 Tsukihara T, Aoyama H, Yamashita E, Tomizaki T, Yamaguchi H, 
Shinzawa-ltoh K, Nakashima R. Yaono R, Yoshikawa S; 
Science 1996;272:1136-1144. 
Number of members: 224 

[0300] 83. COX5B (Cytochrome c oxidase subunit Vb) 

40 [1] 

Medline: 96216288 

The whole structure of the 13-subunit oxidized cytochrome c oxidase at 2.8 A. 

Tsukihara T, Aoyama H, Yamashita E, Tomizaki T, Yamaguchi H. Shinzawa-ltoh K, Nakashima R. Yaono R. Yoshikawa 

S; 

45 Science 1996;272:1136-1144. 

This family consists of chains F and S 
Number of members: 10 

[0301] Cytochrome c oxidase (EC 1.9.3.1) [1] is an oligomeric enzymatic complex which is a component of the 
respiratory chain complex and is involved in the transfer of electrons from cytochrome c to oxygen. In eukaryotes this 

so enzyme complex is located in the mitochondrial inner membrane; in aerobic prokaryotes it is found In the plasma 
membrane. In addition to the three large subunits that form the catalytic center of the enzyme complex there are, in 
eukaryotes, a variable number of small polypeptidic subunits. One of these subunits which is known as Vb in mammals. 
V in slime mold and IV in yeast, binds a zinc atom. The sequence of subunit Vb Is well conserved and includes three 
conserved cysteines that are thought to coordinate the zinc ion [2]. Two of these cysteines are clustered in the C- 

55 terminal section of the subunit; this region has been selected as a signature pattern. 

- Consensus pattem: (LIVM](2)-[FYW]-x(1 0)-C-x(2)-C-G-x(2)-(FY]-K-L [The two Cs probably bind zinc] 
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[ 1] Capaldi R.A.. Malatesta R, Darley-Usmar V.M. Biochim. Biophys. Acta 726:135-148(1983). 

r 2] Rlzzuto R.. Sandona D.. Brini M., Capaldi R.A.. Bisson R. Biochim. Biophys. Acta 1129:100-104(1991). 

[0302] 84. COesterase (Carboxylesterases) 
Cholinesterase pages 

The prints entry is specific to acetylcholinesterase 
Number of members: 273 ^. ^ _x i. r 

[0303] Higher eukaryotes have many distinct esterases. Among the different types are those which act on carboxylic 
esters (EC 3 1 1 -) Carboxyl-esterases have been classified into three categories (A, B and C) on the basis of differ- 
ential patterns of inhibition by organophosphates. The sequence of a number of type-B carboxylesterases Indicates 
[1.2.3] that the majority are evolutionary related. This family cun-ently consists of the following proteins: 

Acetylcholinesterase (EC 3.1.1.7) (AChE) [El] from vertebrates and from Drosophila. 

Mammalian cholinesterase II (butyryl cholinesterase) (EC 3.1 .1 .8). Acetylcholinesterase and cholinesterase II are 
closely related enzymes that hydrolyze choline esters [4]. 
Mammalian liver microsomal carboxylesterases (EC 3.1.1 .1). 

Drosophila esterase 6. produced in the anterior ejaculatory duct of the male insect reproductive system where it 
plays an important role in its reproductive biology. 
Drosophila esterase R 

Culex pipiens (mosquito) esterases B1 and B2. 
Myzus persicae (peach-potato aphid) esterases E4 and FE4. 

Mammalian bile-salt-activated lipase (BAL) [5], a multifunctional lipase which catalyzes fat and vitamin absorption. 
It is activated by bile salts in infant intestine where it helps to digest milk fats. 
Insect juvenile homnone esterase (JH esterase) (EC 3.1.1.59). 
Lipases (EC 3.1.1.3) from the fungi Geotrichum candidum and Candida rugosa. 
Caenorhabditis gut esterase (gene ges-1 ). 

Duck fatty acyl-CoA hydrolase, medium chain (EC 3.1.2.14). an enzyme that may be associated with peroxisome 
proliferation and may play a role in the production of 3-hydroxy fatty acid diester pheromones. 

- Membrane enclosed crystal proteins from slime mold. These proteins are. most probably esterases; the vesicles 
where they are found have therefore been temied esterosomes. 

[0304] So far two bacterial proteins have been found to belong to this family: 

- Phenmedipham hydrolase (phenylcarbamate hydrolase), an Arthrobacter oxidans plasmid^ncoded enzyme (gene 
pcd) that degrades the phenylcarbamate herbicides phenmedipham and desmedipham by hydrolyzing their central 
carbamate linkages. 

Para-nitrobenzyl esterase from Bacillus subtilis (gene pnbA). 

[0305] The following proteins, while having lost their catalytic activity, contain a domain evolutionary related to that 
of carboxylesterases type-B: 

- Thyroglobulin (TG). a glycoprotein specific to the thyroid gland, which is the precursor of the iodinated thyroid 
hormones thyroxine (T4) and triiodo thyronine (T3). 

- Drosophila protein neuractin (gene nrt) which may mediate or modulate cell adhesion between embryonic cells 
during development. 

- Drosophila protein glutactin (gene git), whose function is not known. 

[0306] As is the case for lipases and serine proteases, the catalytic apparatus of esterases involves three residues 
(catalytic triad): a serine, a glutamate or aspartate and a histidine. The sequence around the active site senne is well 
conserved and can be used as a signature pattern. A conserved region located in the N-terminal section containing a 
cysteine involved in a disulfide bond has been selected as a second signature pattem. 

- Consensus pattem: F-[GR]-G-x(4)-[LIVM]-x-[LIV]-x.G-x-S-[STAG]-G[S Is the active site residue] 

- Consensus pattem: [ED]-D-C-L-[YT]-[LIV]-[DNS]-[LIVl-[LIVFYW]-x-[PQR] [C is involved in a disulfide bond] 

[ 1] Myers M.. Richmond R.C.. Oakeshott J.G. Mol. Biol. Evol. 5:113-119(1988). 

[ 2] Krejci E Duval N.. Chatonnet A.. Vincens R, Massoulie J. Proc. Natl. Acad. Sci. U.S.A. 88:6647-6651(1991). 
[3] Cygler M., Schrag J.D.. Sussman J.L.. Harel M.. Silman I. Gentry M.K.. Doctor B.R Protein Sci. 2:366-382 
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(1993). 

[4] Lockridge O. BioEssays 9:125-128(1988). 

[ 5] Wang C.-S.. Hartsuck J.A. Biochim. Biophys. Acta 1166:1-19(1993). 

5 [0307] 85. CPSase_L_chain (Carbamoyl-phosphate synthase (CPSase)) 
Ml 

Medline: 94347758 

Three-dimensional structure of the biotin carboxylase subunit. of acetyl-CoA carboxylase. 
Waldrop GL. Payment 1. Holden HM; 
10 Biochemistry 1 994;33: 1 0249-1 0256. 

[1] 

Medline: 90285162 

Mammalian carbamyl phosphate synthetase (CPS). DNA sequence and evolution of the CPS domain of the Syrian 
hamster multifunctional protein CAD. 
15 Simmer JP, Kelly RE, Rinker AG Jr, Scully JL, Evans DR; 
Biol Chem 1990;265:10395-10402. 
Carbamoyl-phosphate synthase catalyzes the ATP-dependent synthesis of carbamyl-phosphate from glutamine or 
ammonia and bicarbonate. This important enzyme initiates both the urea cycle and the biosynthesis of arginine and/ 
or pyrimidines [2]. 

20 The carbamoyl-phosphate synthase (CPS) enzyme in prokaryotes is a heterodimer of a small and large chain. The 
small chain promotes the hydrolysis of glutamine to ammonia, which is used by the large chain to synthesize carbamoyl 
phosphate. See CPSase_sm_chain. 

The small chain has a GATase domain in the carboxyl terminus. 
See GATase. 

25 Number of members: 181 

[0308] Carbamoyl-phosphate synthase (CPSase) catalyzes the ATP-dependent synthesis of carbamyl-phosphate 
from glutamine (EC 6.3.5.5) or ammonia (EC 6.3.4.16) and bicarbonate [1]. This important enzyme initiates both the 
urea cycle and the biosynthesis of arginine and pyrimidines. 

[0309] Glutamine-dependent CPSase (CPSase II) is involved in the biosynthesis of pyrimidines and purines. In bac- 
30 teria such as Escherichia coli, a single enzyme is involved in both biosynthetic pathways while other bacteria have 
separate enzymes. The bacterial enzymes are formed of two subunits. A small chain (gene carA) that provides 
glutamine amidotransferase activity (GATase) necessary for removal of the ammonia group from glutamine, and a 
large chain (gene carB) that provides CPSase activity. Such a structure is also present in fungi for arginine biosynthesis 
(genes CPA1 and CPA2). In most eukaryotes, the first three steps of pyrimtdine biosynthesis are catalyzed by a large 
35 multifunctional enzyme - called URA2 in yeast, rudimentary in Drosophila and CAD in mammals [2]. The CPSase 
domain is located between an N-terminal GATase domain and the C-terminal part which encompass the dihydroorotase 
and aspartate transcarbamyiase activities. 

[0310] Ammonia-dependent CPSase (CPSase I) is involved in the urea cycle in ureolytic vertebrates; it is a mono- 
functional protein located in the mitochondrial matrix. 
40 [0311] The CPSase domain is typically 120 Kd in size and has arisen from the duplication of an ancestral subdomain 
of about 500 amino acids. Each subdomain independently binds to ATP and it is suggested that the two homologous 
halves act separately, one to catalyze the phosphorylation of bicartx>nate to carboxy phosphate and the other that of 
carbamate to carbamyl phosphate. 

[0312] The CPSase subdomain is also present in a single copy in the biotin-dependent enzymes acetyl-CoA car- 
« boxylase (EC 6.4.1.2) (ACC), propionyl-CoA carboxylase (EC 6.4.1.3) (PCCase). pyruvate carboxylase (EC 6.4.1.1) 
(PC) and urea carboxylase (EC 6.3.4.6). 

[0313] Two conserved regions which are probably important for binding ATP and/or catalytic activity have been se- 
lected as signatures for the subdomain. 

50 - Consensus pattern: [FYV]-[PS]-[LIVMC]-[LIVMA]-[UVM]-[KR]-[PSA]-[STA]-x(3)-[SG]-G-x-[AG] 
- Consensus pattern: [LIVMF]-[LIMN]-E-[LIVMCA]-N-[PATLIVMHKR1-[LIVMSTAC] 

[ 1] Simmer J.P. Kelly R.E.. Rinker A.G. Jr., Scully J.L„ Evans D.R. 
J. Biol. Chem. 265:10395-10402(1990). 
55 1 2] Davidson J.N., Chen K.C., Jamison R.S., Musmanno L.A., Kern C.B. 

BioEssays 15:157-164(1993). 

[0314] 86. CPSase_sm_chain (Carbamoyl-phosphate synthase small chain, CPSase domain) 
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[1] 

Medline: 90285162 . .. . .,.u- 

Mammalian carbamyl phosphate synthetase (CPS). DNA sequence and evolution or ine uro uu...«m. u. u.. 

hamster multifunctional protein CAD. 

Simmer JP, Kelly RE, Rinker AG Jr, Scully JL. Evans DR; 
Biol Chem 1990;265:10395-10402. 

The carbamoyl-phosphate synthase domain is in the amino terminus of protein. , * • . 

Carbamoyl-phosphate synthase catalyzes the ATP-dependent synthesis of carbamyl-phosphate from glutamine or 
ammonia and bicarbonate. This important enzyme initiates both the urea cycle and the biosynthesis of argm.ne and/ 

ThrcT^aroSosphate synthase (CPS) enzyme in prokaryotes is a heterodimer of a small and large ohairv The 
small chain promotes the hydrolysis of glutamine to ammonia, which is used by the large chain to synthesize carbamoyl 
phosphate. See CPSase_L_chain. 

The small chain has a GATase domain in the carboxyl temninus. 
See GATase. 

Number of members: 46 . . x ■ 
[0315] carbamoyl-phosphate synthase (CPSase) catalyzes the ATP-dependent synthesis of carbarriy -phosphate 
from glutamine (EC 6.3.5.5) or ammonia (EC 6.3.4.16) and bicarbonate [1]. This important enzyme initiates both the 
urea cycle and the biosynthesis of arginine and pyrimldines. ^ • , 
r0316] Glutamine-dependent CPSase (CPSase II) is involved in the biosynthesis of pyrimldines and punnes. In bac- 
teria such as Escherichia coll, a single enzyme is involved in both biosynthetic pathways while other bac ena have 
separate enzymes. The bacterial enzymes are formed of two subunits. A small chain (gene carA) that provides 
glutamine amidotransferase activity (GATase) necessary for removal of the ammonia group from glutamine, and a 
large chain (gene carB) that provides CPSase activity. Such a structure is also present in fungi for arginine b'osynthesis 
(genes CPA1 and CPA2). In most eukaryotes. the first three steps of pyrimidine biosynthesis are catalyzed by a ^rge 
multifunctional enzyme - called URA2 in yeast, rudimentary in Drosophila and CAD in mammals [2L The CPSase 
domain is located between an N-tenninal GATase domain and the C-terminal part which encompass the dihydroorotase 
and aspartate transcarbamylase activities. ^ . •* „ ^„„^ 
[0317] Ammonia-dependent CPSase (CPSase I) is involved in the urea cycle in ureolytic vertebrates; it is a mono- 
functional protein located in the mitochondrial matrix. 

r03181 The CPSase domain is typically 120 Kd in size and has arisen from the duplication of an ancestral subdomain 
of about 500 amino acids. Each subdomain independently binds to ATP and it is suggested that the two homologous 
halves act separately, one to catalyze the phosphorylation of bicarbonate to carboxy phosphate and the other that of 
carbamate to carbamyl phosphate. , » 

[03191 The CPSase subdomain is also present in a single copy in the biotin^lependent enzymes acetyl-CoA car- 
boxylase (EC 6.4.1.2) (ACC), propionyl-CoA carboxylase (EC 6.4.1.3) (PCCase). pyruvate cart)0xylase (EC 6.4.1.1) 

(PC) and urea carboxylase (EC 6.3.4.6). ^. ,. ■ „^ 

[0320] Two conserved regions which are probably important for binding ATP and/or catalytic activity have been se- 
lected as signatures for the subdomain. 

. Consensus pattern: [FYVJ-IPS]- [LIVMC]-[LIVMAl-[LIVM]-[KR]-[PSA]-[STA]-x(3)-[SGl-G-x-[AG] 
- Consensus pattern: [LIVMn-tLIMN]-E-ILIVMCA]-N-[PATLIVM]-[KR]-[LIVMSTAC] 

[ 1] Simmer J.P.. Kelly R.E., Rinker A.G. Jr.. Scully J.L., Evans D.R. J. Biol. Chem. 265:10395-10402(1990). 
[2] Davidson J.N.. Chen K.C.. Jamison R.S.. Musmanno L.A., Kem C.B. BioEssays 15:157-164(1993). 

[0321 ] 87. CARL_TRIO (CRAUTRIO domain) 
[1] 

Medline: 98121119 

Crystal structure of the Saccharomyces cerevisiae phosphatidylinositol-transfer protein. 
Sha B. Phillips SE. Bankaitis VA. Luo M; 

TS"origS.l pfJfilJ'hiirb^^^ extended to include the carboxyl domain from the known structure of Sec14^Swiss: 
P10911 has not been included in the Pfam family because it does not appear to contain a complete structural domain. 
Number of members: 39 

[0322] 88. CSD ("Cold-shock'DNA-binding domain) 
[1] 

Medline: 94255482 
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Crystal stmcture of CspA. the major cold shock protein of Escherichia coli. 
Schindelin H. Jiang W. Inouye M, Heinemann U; 
Proc Natl Acad Sci U S A 1994;91:5119-5123. 
Number of members: 121 

5 [0323] A conserved domain of about 70 amino acids has been found in prokaryotic and eukaryotic DNA-binding 
proteins [1 .2,3,E1]. This domain, which is known as the 'cold-shock domain'(CSD) is present in the proteins listed below. 

Escherichia coli protein CS7.4 (gene cspA) which is Induced In response to low temperature (cold-shock protein) 
and which binds to and stimulates the transcription of the CCAAT-contalnIng promoters of the HN-S protein and 
10 of gyrA. 

Mammalian Y box binding protein 1 (YB1 ). A protein that binds to the CCAAT-containing Y box of mammalian HLA 
class II genes. 

Xenopus Y box binding proteins -1 and -2 (Y1 and Y2). Proteins that bind to the CCAAT-containing Y box of 
Xenopus hsp7G genes. 

15 . Xenopus B box binding protein (YB3). YB3 binds the B box promoter element of genes transcribed by RNA polymer- 
ase III. 

Enhancer factor I subunit A (EFI-A) (dbpB). A protein that also bind to CCAAT-motif in various gene promoters. 
DbpA, a Human DNA-binding protein of unknown specificity. 
Bacillus subtilis cold-shock proteins cspB and cspC. 
20 . Streptomyces clavuligerus protein SC 7.0. 

Escherichia coli proteins cspB, cspC, cspD, cspE and cspF. 

Unr, a mammalian gene encoded upstream of the N-ras gene. Unr contains nine repeats that are similar to the 
CSD domain. The function of Unr Is not yet known but It could be a multivalent DNA-bindIng protein. 

25 [0324] As a signature pattern for the CSD domain, its most conserved region which is located in its N-terminal section 
has been selected. It must be noted that the 

beginning of this region is highly similar [4] to the RNP-1 RNA-binding motif. 

- Consensus pattem: [FY]-G-F-l-x(6,7)-[DER]-[LIVM]-F-x-H-x-[STKR]-x-[LIVMFY] 

30 

1 1] Donlger J., Landsman D.. Gonda M.A.. Wistow G. 

New Blol. 4:389-395(1992). 
[2] Wistow G. 

Nature 344:823-824(1990). 
35 [ 3] Jones P.G., Inouye M. 

Mol. Microbiol. 11:811-818(1994). 
[ 4] Landsman D. 

Nucleic Acids Res. 20:2861-2864(1992). 

40 [0325] 89. CTF_NFI (CTF/NF-I family) 
Number of members: 45 

[0326] Nuclear factor I (NF-I) or CCAAT box-binding transcription factor (CTF) [1 ,2] (also known as TGGCA-binding 
proteins) are a family of vertebrate nuclear proteins which recognize and bind, as dimers, the palindromic DNA se- 
quence 5'-TGGCANNNTGCCA-3'. CTF/NF-I binding sites are present in viral and cellular promoters and In the origin 

45 of DNA replication of Adenovirus type 2. 

[0327] The CTF/NF-l proteins were first Identified as nuclear factor I, a collection of proteins that activate the repli- 
cation of several Adenovirus serotypes (together with NF-II and NF-III) [3]. The family of proteins was also identified 
as the CTF transcription factors, before the NFI and CTF families were found to be identical [4]. The CTF/NF-I proteins 
are individually capable of activating transcription and DNA replication. The CTF/NF-I family name has also been 

50 dubbed as NFI. NF-I or NFI . 

[0328] In a given species, there are a large number of different CTF/NF-I proteins. The multiplicity of CTF/NF-I is 
known to be generated both by alternative splicing and by the occurrence of four different genes. The known forms of 
NF-I genes have been classified as: 

55 - The CTF-lIke factors subfamily (prototype form: CTF-1) [4] 
The NFI-X proteins. 
The NFI-A proteins. 
The NFI-B proteins. 
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[0329] So far all CTF/NF-I family members appear to have similar transcription and replication activities. 
r0330] CTF/NF-1 proteins contains 400 to 600 amino acids. The N-tenninal 200 amino-acid sequence, almost per- 
fectly conserved in all species and genes sequenced, mediates site-specitic una recognition, protein dimer.zciL.u, . and 
Adenovirus DNA replication. The C-terminal 100 amino acids contain the transcriptional activation domain. This acti- 
vation domain is the target of gene expression regulatory pathways elliclted by growth factors and it interacts with basal 
transcription factors and with histone H3 [6]. ^-rr-/Kiir i u k 

[0331] A perfectly conserved, highly charged 1 2 residue peptide located In the N-terminal part of CTF/NF-l has oeen 
selected as a specific signature for this family of proteins. 

- Consensus pattern: R-K-R-K-Y-F-K-K-H-E-K-R 

[ 1] Mennod N.. O'Neill E.A., Kelly TJ.. Tjian R. 

Cell 58:741-753(1989). 
[ 2] Rupp R.A.W., Kruse U., Multhaup G., Goebel U.. Beyreuther K., 

Sippel A.E. 

Nucleic Acids Res. 18:2607-2616(1990). 
[ 3] Nagata K., Guggenheimer R.A., Enomoto T, Lichy J.H.. Hunwitz J. 

Proc. Natl. Acad. Sci. U.S.A. 79:6438-6442(1982). 
[4] Santoro C, Mermod N.. Andrews RC, Tjian R. 

Nature 334:2118-2224(1988). 
[ 5] Gil G., Smith J.R., Goldstein J.L., Slaughter C.A., Orth K.. Brown M.S.. Osborne TF. 

Proc. Natl. Acad. Sci. U.S.A 85:8963-8967(1988). 
[ 6] Alevizopoulos A., Dussen-e Y, Tsai-Pflugf elder M.. von der Weid T. Wahli W.. Mermod N. 

Genes Dev. 9:3051-3066(1995). 

[0332] 90. Calsequestrin (Caisequestrin) 

Number of members: 13 ^ , , * . ■ m 

[0333] Calsequestrin is a moderate-affinity, high-capacity calcium-binding protein of cardiac and skeletal muscle li J, 
where it is located in the lumenal space of the sarcoplasmic reticulum terminal cisternae. Calsequestnn acts as a 
calcium buffer and plays an important role in the muscle excitation-contraction coupling. It is a highly acidic protein of 
about 400 amino acid residues that binds more than 40 moles of calcium per mole of protein. There are at least two 
different forms of calsequestrin: one which is expressed in cardiac muscles and another in skeletal muscles. Both 
forms have highly similar sequences. 

[0334] Two signature sequences have been developed. The first con^esponds to the N-tenninus of the mature protein, 
the second is located just in front of the C-terminus of the protein which is composed of a highly acidic tail of vanable 
length. 

- Consensus pattern: [EQ]-[DE]-G-L-[DN]-F-P-x-Y-D-G-x-D-R-V 

. Consensus pattern: [DE]-L-E-D-W-[LlVMl-E-D-V-L-x-G-x-[LIVM]-N-T-E-D-D-D 

[0335] [ 1] Treves S., Vilsen B., Chiozzi P, Andersen J. P., Zorzato F. 

Biochem. J. 283:767-772(1992). 
[0336] 91. CarboxyLtrans (Carboxyl transferase domain) 

[1] 

Medline: 93374821 ^ ^ ^ ^ - „ 

Primary structure of the monomer of the 12S subunit of transcarboxylase as deduced from DNA and characterization 
of the product expressed in Escherichia coli. 

Thomton CG, Kumar GK. Haase FC, Phillips NF, Woo SB, Park VM. 

Magner WJ. Shenoy BC. Wood HG, Samols D; J Bacterid 1993;175:5301-5308. 
[2] 

Medline: 93358891 
Molecular evolution of biotin-dependent carboxylases. 

Toh H, Kondo H, Tanabe T; 

Eur J Biochem 1993;215:687-696. 
All of the members in this family are biotin dependent carboxylases. 

The carboxyl transferase domain cames out the following reaction; transcarboxylation from biotin to an acceptor mol- 
ecule. There are two recognised types of carboxyl transferase. One of them uses acyl-CoA and the other uses 2-oxo 
acid as the acceptor molecule of carbon dioxide. 
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All of the members in this family utilise acyl-CoA as the acceptor molecule. 
Number of members: 47 

[0337] 92. ChaLstiLsynt (Chalcone and stilbene synthases) 
Number of members: 146 

s [0338] Chalcone synthases (CHS) (EC 2.3.1 .74) and stilbene synthases (STS) (formerly known as resveratrol syn- 
thases) are related plant enzymes [1]. CHS is an important enzyme in flavanoid biosynthesis and STS a key enzyme 
in stiibene-type phyloalexin biosynthesis. Both enzymes catalyze the addition of three molecules of malonyl-CoA to a 
starter CoA ester (a typical example is 4-coumaroyl-CoA). producing either a chalcone (with CHS) or stilbene (with 
STS). 

10 [0339] These enzymes are proteins of about 390 amino-acid residues. A conserved cysteine residue, located In the 
central section of these proteins, has been shown [2] to be essential for the catalytic activity of both enzymes and 
probably represents the binding site for the 4-coumaryl-CoA group. The region around this active site residue is well 
conserved and can be used as a signature pattern. 

[0340] In addition to the plant enzymes, this family also includes Bacillus subtilis bcsA. 

15 

- Consensus pattern: R-[LIVMFYS]-x-[LIVM]-x-[QHG]-x-G-C-[FYNA]-[GA]-G-[GA]-[STAV]-x-[LIVMF]-[RA] (C is the 
active site residue] 

[ 1] Schroeder J.. Schroeder G. 
20 z. Naturforsch. 45C: 1-8(1 990). 

[ 2] Lanz T., Tropf S., Marner F.-J.. Schroeder J., Schroeder G. 
J. Biol. Chem. 266:9971-9976(1991). 

[0341] 93. Chorismate_synt (Chorismate synthase) 

25 Number of members: 19 

[0342] Chorismate synthase (EC 4.6.1.4) catalyzes the last of the seven steps in the shikimate pathway which is 
used in prokaryotes, fungi and plants for the biosynthesis of aromatic amino acids, it catalyzes the 1 ,4-trans elimination 
of the phosphate group from 5-enolpynjvylshikimate-3-phosphate (EPSP) to form chorismate which can then be used 
in phenylalanine, tyrosine or tryptophan biosynthesis. Chorismate synthase requires the presence of a reduced flavin 

30 mononucleotide (FMNH2 or FADH2) for its activity. 

[0343] Chorismate synthase from various sources shows [1 ,2] a high degree of sequence conservation. It is a protein 
of about 360 to 400 amino-acid residues. Three signature patterns have been developed from conserved regions rich 
in basic residues (mostly arginines). The first is in the N-terminal section, the second is central and the third is C-termtnal. 

35 - Consensus pattem: G-E-S-H-[GCl-x(2)-[LIVM]-IGTV]-x-[LIVM](2)-[DE]-G-x-[PV] 

- Consensus pattem: [GE]-R-[SA](2)-[SAG]-R-[EV]-[ST|-x(2)-[RH]-V-x(2)-G 

- Consensus pattern: R-[SH]-D-[PSV]-[CSAVl-x(4)-[GAI]-x-[IVGSP]-[LIVM]-x-E-[STAH]-[LIVM] 

[ 1] Schaller A., Schmid J.. Leibinger U., Amrhein N. 
40 J. Biol. Chem. 266:21434-21438(1991). 

[ 2] Jones D.G.L.. Reusser U.. Braus G.H. 
MoL Microbiol. 5:2143-2152(1991). 

[0344] 94. Clat_adaptor_s (Clathrin adaptor complex small chain) 

45 Number of members: 21 

[0345] Clathrin coated vesicles (CCV) mediate intracellular membrane traffic such as receptor mediated endocytosis. 
In addition to clathrin, the CCV are composed of a number of other components including oligomeric complexes which 
are known as adaptor or clathrin assembly proteins (AP) complexes [1]. The adaptor complexes are believed to interact 
with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. In mammals two type of 

50 adaptor complexes are known: AP-1 which is associated with the Golgt complex and AP-2 which is associated with 
the plasma membrane. Both AP-1 and AP-2 are heterotetramers that consist of two large chains - the adaptins - 
(gamma and beta' in AP-1 ; alpha and beta in AP-2); a medium chain (AP47 in AP-1 ; AP50 In AP-2) and a small chain 
(AP19 in AP-1; AP17 in AP-2). 

[0346] The small chains of AP-1 and AP-2 are evolutionary related proteins of about 18 Kd. Homologs of API 7 and 
55 AP19 have also been found in yeast (genes APS1/YAP19 and APS2A'AP17) [2.3,4]. AP17 and AP19 are also related 
to the zeta-chain [5] of coatomer (zeta-cop). a cytosolic protein complex that reversibly associates with Golgl mem- 
branes to form vesicles that mediate biosynthetic protein transport from the endoplasmic reticulum, via the Golgi up 
to the trans Golgi network. 
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[0347] A conserved region in the central section of these proteins has been selected as a signature pattern. 
- Consensus pattern: [UVMl(2)-Y-[KR]-x(4)-L-Y-F 

[ 1] Pearse B.M., Robinson M.S. 

Annu. Rev. Cell Biol. 6:151-171(1990). 
[ 2] Kirchhausen T., Davis A.C., Frucht S., O'Brine Greco B., Payne G.S., 

Tubb B. 

J. Biol. Chem. 266:11153-11157(1991). 
[ 3] Nakal M.. Takada T, Endo T. 

Biochlm. Biophys. Acta 1174:282-284(1993). 
[ 4] Phan H.L., FInlay J.A., Chu D.S., Tan P.K., Kirchhausen T.. Payne G.S. 

EMBO J. 13:1706-1717(1994). 
[ 5] Kuge O., Hara-Kuge S.. Orci L., Ravazzola M., Amherdt M., Tanigawa G., 

Wieland F.T., Rothman J.E. 

J. Cell Biol. 123:1727-1734(1993). 



[0348] 95. Clathrin_lg_ch (Clathrin light chain.) 

Number of members: 8 . ^ x * n 

[0349] Clathrin [1,2] is the major coat-fomning protein that encloses vesicles such as coated pits and fonns cell 
surface patches Involved in membrane traffic within eukaryotic cells. The clathrin coats (called triskelions) are com- 
posed of three heavy chains (180 Kd) and three light chains (23 to 27 Kd). 

[0350] The clathrin light chains [3). which may help to properly orient the assembly and disassembly of the clathnn 
coats, bind non-covalently to the heavy chain, they also bind calcium and interact with the hsc70 uncoating ATPase. 

- In higher eukaryotes two genes code for distinct but related light chains: LC(a) and LC(b). Each of the two genes 
can yield by tissue-specific attemative splicing, two separate forms which differ by the insertion of a sequence of 
respectively thirty or eighteen residues. There is, in the N-terminal part of the clathrin light chains a domain of 
twenty one amino acid residues which is perfectly conserved in LC(a) and LC(b). 

. In yeast there is a single light chain (gene CLC1) whose sequence is only distantly related to that of higher eu- 
karyotes. 

[0351] Two signature patterns have been developed for clathrin light chains. The first pattern is a heptapeptide from 
the center of the conserved N-terminal region of eukaryotic light chains; the second pattern is derived from a positively 
charged region located in the C-terminal extremity of all known clathrin light chains. 



Consensus pattem: F-L-A-Q-Q-E-S 



[ 1] Keen J.H. 

Annu. Rev. Biochem. 59:415-438(1990). 
[2JBrodsky RM. 

Science 242:1396-1402(1988). 
[ 3] Brodsky F.M.. Hill B.L.. Acton S.L., Naethke I.. Wong D.H., 

Ponnambalam S., Parham P. 

Trends Biochem. Sci. 16:208-213(1991). 



[0352] 96. (Clathrin repeat) 7-fold repeat in Clathrin and VPS 

Each repeat is about 140 amino acids long. The repeats occur in the ami region of the Clathrin heavy 
Number of members: 79 
[1] 

Medline: 92191269 

Folding and trimerization of clathrin subunits at the triskelion hub. 
Nathke IS, Heuser J. Lupas A. Stock J. Turck CW. Brodsky FM; 
Cell 1992:68:899-910. [2] 
Medline: 88097376 

Clathrin heavy chain: molecular cloning and complete primary structure. 
Kirchhausen T, Harrison SC, Chow EP, Mattaliano RJ, 
Ramachandran KL. Smart J. Brosius J; 
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Proc Natl Acad Sci U S A 1987;84:8805-8809. 
[0353] 97. Collagen (Collagen triple helix repeat (20 copies)) 
[1] Medline: 94059583 
New members of the collagen superfamily 
5 Mayne R. Brewton RG; 

Curr Opin Cell Biol 1993;5:883-890. 
Scurvy is associated with collagens. 

Members of this family belong to the collagen superfamily [1]. 

Collagens are generally extracellular structural proteins involved in formation of connective tissue structure. 
10 The alignment contains 20 copies of the G-X-Y repeat that forms a triple helix. The first position of the repeat is glycine, 
the second and third positions can be any residue but are frequently proline and hydroxyproline. Collagens are post 
translationally modified by proline hydoxylase to form the hydroxyproline residues. Defective hydroxylation is the cause 
of scurvy. 

Some members of the collagen superfamily are not involved in connective tissue structure but share the same triple 
15 helical structure. 

Number of members: 2125 

[0354] 98. Coprogen.oxidas (Coproporphyrinogen III oxidase) 
Number of members: 12 

Coproporphyrinogen 111 oxidase (EC 1.3.3.3) (coproporphyrinogenase) [1,2] catalyzes the oxidative decarboxylation 
20 of coproporphyrinogen 111 into protoporphyrinogen IX, a common step in the pathway for the biosynthesis of porphyrins 
such as heme, chlorophyll or cobalamin. 

[0355] Coproporphyrinogen III oxidase is an enzyme that requires iron for its activity. A cysteine seems to be important 
for the catalytic mechanism [3]. Sequences from a variety of eukaryotic and prokaryotic sources show that this enzyme 
has been evotutionarily conserved. A highly conserved region in the central part of the sequence has been selected 
25 as a signature pattern. This region contains the only conserved cysteine and is rich in charged amino acids. 

- Consensus pattern: K-x-W-C-x(2)-[FYHJ(3)-[LIVM]-x-H-R-x-E-x-R-G-[LlVM]-G-G-[LIVM]-F-F-D 



[1] Xu K.,EllinttT 
30 J, Bacteriol. 175:4990-4999(1993). 

( 2] Kohno H., Furukawa T., Yoshinaga T, Tokunaga R., Taketani S. 

J. Biol. Chem. 268:21359-21363(1993). 
[ 3] Camadro J.M., Chambon H., Jolles J., Labbe R 
Eur. J. Biochem. 156:579-587(1986). 
35 [ 4] Xu K., Elliott T 

J. Bacteriol. 176:3196-3203(1994). 



[0356] 99. Corona.nucleoca (Coronavlrus nucleocapsid protein) 
11] 

40 Medline: 98087828 

Identification of a specific interaction between the 
coronavlrus mouse hepatitis virus A59 nucleocapsid protein 
and packaging signal. 
Molenkamp R, Spaan WJ; 
45 Virology 1997;239:78-86. 

Number of members: 44 

[0357] 100. Cu-oxidase (Multicopper oxidase) 
II] 

Medline: 90126844 
so The blue oxidases, ascorbate oxidase, laccase and ceruloplasmin. 
Modelling and structural relationships. 
Messerschmidt A, Huber R; 
Eur J Biochem 1990;187:341-352. 
Number of members: 150 

55 [0358] Multicopper oxidases [1 ,2] are enzymes that possess three spectroscopically different copper centers. These 
centers are called: type 1 (or blue), type 2 (or normal) and type 3 (or coupled binuclear). The enzymes that belong to 
this family are: 
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Laccase (EC 1 . 1 0.3.2) (urishiol oxidase), an enzyme found in fungi and plants, which oxidizes many different types 
of phenols and diamines. 

- Ascorbate oxidase (EC 1.10.3.3), a higher plant enzyme. 

- Cenjloplasmin (EC 1.16.3.1) (fen-oxidase). a protein found in the serum of mammals and birds, which oxidizes a 
5 great variety of inorganic and organic substances. Structurally ceruloplasmin exhibits intennal sequence homology. 

and seem to have evolved from the triplication of a copper-binding domain similar to that found in laccase and 

ascorbate oxidase. 

[0359] In addition to the above enzymes there are a number of proteins which, on the basis of sequence similarities. 
10 can be said to belong to this family. These proteins are: 

- Copper resistance protein A (copA) from a plasmid in Pseudomonas syringae. This protein seems to be involved 
in the resistance of the microbial host to copper. 

Blood coagulation factor V (Fa V). 
15 - Blood coagulation factor VIII (Fa VIII) [El]. 

Yeast FET3 [3], which is required for ferrous iron uptake. 

- Yeast hypothetical protein YFL041 w and SpAC1 F7.08. the fission yeast homolog. 

[03601 Factors V and VIII act as cofactors in blood coagulation and are structurally similar [4]. Their sequence consists 
20 of a triplicated A domain, a B domain and a duplicated C domain; in the following order: AnA-B-A-C-C. The A-type 
domain is related to the multicopper oxidases. 

[0361] Two signature patterns have been developed for these proteins. Both pattems are derived from the same 
region, which in ascorbate oxidase, laccase, in the third domain of ceruloplasmin, and in copA. contains five residues 
that are known to be involved in the binding of copper centers. The first pattern does not make any assumption on the 
25 presence of copper-binding residues and thus can detect domains that have lost the ability to bind copper (such as 
those in Fa V and Fa VIII), while the second pattern is specific to copper-binding domains. 

- Consensus pattern: G-x-[FYW]-x-[UVMFYW]-x-[CST|-x(8)-G-[LM]-x(3)-[LIVMFYW] 

. Consensus pattem: H-C-H-x(3)-H-x(3)-IAG]-[LM] [The first two H's are copper type 3 binding residues] [The C. 
30 the 3rd H, and L or M are copper type 1 ligands] 



35 



[0362] 1 01 . Cullin (Cullin family) 
Number of members: 24 

[0363] The following proteins are collectively termed cullins [1]: 



Caenorhabditis elegans cul-1 (or lin-19), a protein required for developmentally programmed transitions from the 
G1 phase of the cell cycle to the GO phase or the apoptotic pathway. 

- Caenorhabditis elegans cul-2, cul-3, cul-4 (F45E12.3). cul-5 (ZK856.1) and cul-6 (K08E7.7). 

- Mammalian CUL1 , CUL2, CUL3, CUL4A and CUL4B. 

40 - Mammalian vasopressin-activated calcium-mobilizing receptor (VACM-1 ). a kidney-specific protein thought to fomi 
a cell surface receptor [2] but which does not have any structural hallmarks of a receptor 
Drosophila Iin19. 

- Yeast CDC53 [3], which acts in concert with CDC4 and UBC3 (CDC34) to control the Gl-to-S phase transition. 
Yeast hypothetical protein YGR003w. 

45 - Fission yeast hypothetical protein SpAC24H6.03. 

[0364] The cullins are hydrophilic proteins of 740 to 81 5 amino acids. The C-tenninal extremity is the most conserved 
part of these proteins. A signature pattem has been developed from that region. 

50 - Consensus pattem: [LIV]-K-x(2)-[LIV]-x(2)-L-l-[DEQ]-[KRHNQ]-x-Y-[LIVM]-x-R-x(6.7)-[FY]-x-Y-x-[SA]> 



[ 1] Kipreos E.T.. Lander L.E.. Wing J.P., He W.W., Hedgecock E.M. 

Cell 85:829-839(1996). 
[ 2] Bumatowska-Hledin M.A.. Spielman W.S.. Smith W.L.. Shi P.. Meyer J.M,. 
55 Dewitt D.L. 

Am. J. Physiol. 268:f1198-F1210(1995). 
[ 3] Mathias N., Johnson S.L.. Winey M., Adams A.E., Goetsch L., Pringle J.R., 

Byers B.. Goebl M.G. 
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Mol. Cell. Biol. 16:6634-6643(1996). 

[0365] 102. (Cu_amlne_oxld) 
Copper amine oxidase signatures 
5 Amine oxidases (AO) [1] are enzymes that catalyze the oxidation of a wide range of biogenic amines Including many 
neurotransmitters, histamine and xenobiotic amines. There are two classes of amine oxidases: flavin-containing (EC 
1.4.3.4) and copper-containing (EC 1.4.3.6). 

[0366] Copper-containing AO is found in bacteria, fungi, plants and animals, it is an homodimeric enzyme that binds 
one copper ion per subunit as well as a 2,4,5- trihydroxyphenylalanine quinone (or topaquinone) (TPQ) cofactor. This 

10 cofactor is derived from a tyrosine residue. 

[0367] Two signature patterns were derived for copper AO, the first one contains the tyrosine which give rises to the 
TPQ cofactor while the second one contains one of the three histidines that bind the copper atom [2]. 
[0368] Consensus pattern[LIVM)-[LIVMA]-[LIVMF]-x(4)-[STl-x(2)-N-Y-[DE]-[YN] [The first Y gives rises to TPQ] Se- 
quences known to belong to this class detected by the pattemALL. 

15 [0369] Consensus pattern T-x-[GS]-x(2)-H-(LIVMF]-x(3)-E-[DE]-x-P (H is a copper ligand] Sequences known to be- 
long to this class detected by the pattern ALL, except for lentil AO. 

[ 1] Knowles P.F., Dooley D.M. (In) Metal ions in biological systems; Sigel H., Sigel A.. Eds., 30:361- 403, Marcel 
Dekker, New-York. (1993). 

20 1 2] Parsons M.R., Convery M.A., Wilmot CM., Yadav K.D.S., Blakeley V.. Corner A.S.. Phillips S.E.V., McPherson 

M.J,. Knowles PR Structure 3:1171-1184(1995). 

[0370] 103. Cys-protease (Cysteine protease) 
Number of members: 358 

25 [0371] Eukaryotic thiol proteases (EC 3.4.22.-) [1] are a family of proteolytic enzymes which contain an active site 
cysteine. Catalysis proceeds through a thioester intermediate and is facilitated by a nearby histldine side chain; an 
asparagine completes the essential catalytic triad. The proteases which are currently known to belong to this family 
are listed below (references are only provided for recently determined sequences). 

30 - Vertebrate lysosomal cathepsins B (EC 3.4.22.1), H (EC 3.4.22.16). L (EC 3.4.22.15). and S (EC 3.4.22.27) [2]. 
Vertebrate lysosomal dipeptidyl peptidase I (EC 3.4.14.1) (also known as cathepsin C) [2J. 
Vertebrate calpains (EC 3.4.22.17). Calpains are intracellular calcium-activated thiol protease that contain both a 
N-terminal catalytic domain and a C-terminal calcium-binding domain. 
Mammalian cathepsin K, which seems involved in osteoclastic bone resorption [3]. 

35 . . Human cathepsin O [4]. 

Bleomycin hydrolase. An enzyme that catalyzes the inactivation of the antitumor drug BLM (a glycopeptide). 
Plant enzymes: bariey aleurain (EC 3.4.22.16), EP-B1/B4; kidney bean EP-C1, rice bean SH-EP; kiwi fruit actinidin 
(EC 3.4.22.14); papaya latex papain (EC 3.4,22.2), chymopapain (EC 3.4.22.6), caricain (EC 3.4.22.30). and pro- 
teinase IV (EC 3.4.22.25); pea turgor-responsive protein 15A; pineapple stem bromelain (EC 3.4.22.32); rape 

40 COT44; rice oryzain alpha, beta, and gamma; tomato low-temperature induced, Arabidopsis thaliana A494, RD1 9A 

and RD21A. 

House-dust mites allergens DerPI and EurM1. 

Cathepsin B-like proteinases from the worms Caenorhabditis elegans (genes gcp-1, cpr-3, cpr-4, cpr-5 and cpr- 
6), Schistosoma mansoni (antigen SM31) and Japonica (antigen SJ31). Haemonchus contortus (genes AC-1 and 
45 AC-2), and Ostertagia ostertagi (CP-1 and CP-3). 

Slime mold cysteine proteinases CP1 and CP2. 
Cruzipain from Trypanosoma cruzi and brucei. 

Throphozoite cysteine proteinase (TCP) from various Plasmodium species. 
Proteases from Leishmania mexicana, Theileria annulate and Theileria parva. 
50 . Baculoviruses cathepsin-like enzyme (v-cath). 

Drosophila small optic lot>es protein (gene sol), a neuronal protein that contains a calpain-like domain. 
- Yeast thiol protease BLH1/YCP1/LAP3. 

Caenorhabditis elegans hypothetical protein C06G4.2, a calpain-like protein. 

55 [0372] Two bacterial peptidases are also part of this family: 

Aminopeptidase C from Lactococcus lactis (gene pepC) [5]. 
Thiol protease tpr from Porphyromonas gingivalis. 
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[0373] Three other proteins are structurally related to this family, but may have lost their proteolytic activity 

- Soybean oil body protein P34. This protein has its active site cysteine replaced by a glycine. 

- Rat testin, a Sertoli cell secretory protein highly similar to cathepsin L but with the active site cysteine is replaced 
5 by a serine. Rat testin should not be confused with mouse testin which is a LIM-domain protein (see 

<PDOC00382>). 

- Plasmodium falciparum serine-repeat protein (SERA), the major blood stage antigen. This protein of 111 Kd pos- 
sesses a C-terminal thiol-protease-like domain [6], but the active site cysteine is replaced by a serine. 

10 [0374] The sequences around the three active site residues are well conserved and can be used as signature pat- 
terns. 

- Consensus pattern: Q-x(3)-[GE]-x-C-[Y\A0-x(2)-[STAGC]-[STAGCV] [C is the active site residue] 

- Consensus pattern: [LIVMGSTAN]-x-H-[GSACEl-[LIVMl-x-ILIVMAT|(2)-G-x-[GSADNH] [H is the active site resi- 
ts due] 

- Consensus pattern: [FYCH]-[WI]-[LIVTl-x-[KRQAG]-N-[ST]-W-x(3)-[FYV\n-G-x(2)-G-[LFY^ 
[LIVMF] [N is the active site residue] 

20 1 1] Dufour E. 

Biochimie 70:1335-1342(1988). 
[ 2] Kirschke H., Ban-ett AJ., Rawlings N.D. 

Protein Prof. 2:1587-1643(1995). 
[ 3] Shi G.-P. Chapman H.A., Bhairi S.M.. Deleeuw C. Reddy V.Y., Weiss S.J. 
25 FEBS Lett. 357:129-134(1995). 

[ 4] Velasco G.. Fen-ando A.A., Puente X.S., Sanchez L.M., Lopez-Otin C. 

J. Biol. Chem. 269:27136-27142(1994). 
[ 5] Chapot-Chartier M.P.. Nardi M., Chopin M.C., Chopin A., Gripon J.C. 
Appl. Environ. Microbiol. 59:330-333(1993). 
30 [ 6] Higgins D.G., McConneil D.J., Sharp P.M. 

Nature 340:604-604(1989). 
[ 7] Rawlings N.D.. Barrett A.J. 
Meth. Enzymol. 244:461-486(1994). 

35 [0375] 1 04. Cys_Met_Meta_PP (Cys/Met metabolism PLP-dependent enzyme) 
[1] Medline: 96428687 

Crystal structure of the pyridoxal-5'-phosphate dependent cystathionine beta-lyase from Escherichia coli at 1.83 A. 
Clausen T, Huber R, Laber B. Pohlenz HD, Messerschmidt A; 
J Mol Biol 1996;262:202-224. 
40 [1] Medline: 99059720 

Crystal structure of Escherichia coli cystathionine gamma-synthase at 1.5 A resolution. 
Clausen T, Huber R, Prade L, Wahl MC, Messerschmidt A; 
EMBO J 1998;17:6827-6838. 
Database Reference: SCOP; Icsl; fa; [SCOP-US A1[CATH-PDBSUM] 
45 This family includes enzymes involved in cysteine and methionine metabolism. The following are members: 

Cystathionine gamma-lyase, 
Cystathionine gamma-synthase. 
Cystathionine beta-lyase. 
50 Methionine gamma-lyase, 

OAH/OAS sulfhydrylase, 
O-succinylhomoserine sulphhydrylase 

All of these members participate is slightly different reactions. 
55 All these enzymes use PLP (pyridoxal-5*-phosphate) as a cofactor. 
Number of members: 52 

[0376] A number of pyridoxal-dependent enzymes involved in the metabolism of cysteine, homocysteine and me- 
thionine have been shown [1,2] to be evolutionary related. These are: 
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Cystathionine gamma-lyase (EC 4.4.1.1) (gamma-cystathionase), which catalyzes the transformation of cystath- 
ionine into cysteine, oxobutanoate and ammonia. This is the final reaction in the transulfuration pathway that leads 
from methionine to cysteine in eukaryotes. 

Cystathionine gamma-synthase (EC 4.2.99.9), which catalyzes the conversion of cysteine and succinyl-homoser- 
5 ine into cystathionine and succinate: the first step in the biosynthesis of methionine from cysteine in bacteria (gene 

metB). 

Cystathionine beta-lyase (EC 4.4.1 .8) (beta-cystathionase), which catalyzes the conversion of cystathionine into 
homocysteine, pyruvate and ammonia: the second step in the biosynthesis of methionine from cysteine in bacteria 
(gene metC). 

10 - Methionine gamma-lyase (EC 4.4.1.11) (L-methioninase) which catalyzes the transformation of methionine into 
methanethiol, oxobutanoate and ammonia. 

OAH/OAS sulfhydrylase. which catalyzes the conversion of acetylhomoserine into homocysteine and that of ace- 
tylserine into cysteine (gene MET17 or MET25 in yeast). 
O-succinylhomoserine sulfhydrylase (EC 4.2.99.-). 
15 - Yeast hypothetical protein YGL1 84c. 

Yeast hypothetical protein YHR112c. 

[0377] These enzymes are proteins of about 400 amino-acid residues. The pyridoxal-P group is attached to a lysine 
residue located in the central section of these enzymes; the sequence around this residue is highly conserved and can 
20 be used as a signature pattem to detect this class of enzymes. 

- Consensus pattern: [DQ]-[LIVMF]-x(3)-[STAGC]-[STAGCI]-T-K-[FYWQ]-[LIVMF]-x-G-[HQHSGNH] [K is the pyri- 
doxal-P attachment site] 

25 [ 1] Ono B.I.. Tanaka K., Naito K., Heike C, Shinoda S., Yamamoto S.. Ohmori S., Oshima T.. Toh-E A. J. Bacteriol. 

174:3339-3347(1992). 

[2] Barton A.B., Kaback D.B,. Clark M.W., Keng T., Ouellette B.F.F.. Storms R.K.. Zeng B.. Zhong W.W.. Fortin 
N.. Delaney S.. Bussey H. Yeast 9:363-369(1993). 

30 [0378] 105. Cyt.reductase 

FAD/NAD-binding Cytochrome reductase 
Number of members: 60 
[1] Medline: 95111952 

Crystal structure of the FAD-containing fragment of com nitrate reductase at 2.5 A resolution: relationship to other 
35 flavoprotein reductases. 

Lu G. Campbell WH, Schneider G, Lindqvist Y; 
Structure 1994;2:809-821. 
[2] Medline: 92084635 

The sequence of squash NADHinitrate reductase and its relationship to the sequences of other flavoprotein oxidore- 

^ ductases. A family of flavoprotein pyridine nucleotide cytochrome reductases. 
Hyde GE, Crawford NM, Campbell WH; 

J Biol Chem 1991;266:23542-23547. 
[0379] 106. Cytidylyltrans 
Phosphatidate cytidylyltransferase 

45 Number of members: 21 

[0380] Phosphatidate cytidylyltransferase (EC 2.7.7.41) [1 ,2,3] (also known as CDP-diacylglycerol synthase) (CDS) 
is the enzyme that catalyzes the synthesis of CDP-diacylglycerol from CTP and phosphatidate (PA). CDP-diacylglycerol 
is an important branch point intermediate in both prokaryotic and eukaryotic organisms. CDS is a membrane-bound 
enzyme. A conserved region located in the C-terminal part has been selected as a signature pattem. 

50 

• Consensus pattem: S-x-[LIVMF]-K-R-x(4)-K-D-x-[GSA]-x(2)-[LI]-[PG]-x-H-G-G-[LIVM]-x-D-R-[LIVMF]-D 

[ 1] Sparrow CP., Raetz C.R.H. 

J, Biol. Chem. 260:12084-12091(1985). 
55 [ 2] Shen H.. Heacock P.N., Clancey C.J.. Dowhan W. 

J. Biol. Chem. 271:789-795(1996). 
[ 3] Saito S.. Goto K., Tonosaki A.. Kondo H. 

J. Biol. Chem. 272:9503-9509(1997). 
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[0381] 1 07. (Cytldylyltransf) Cytidylyltransferase. This family includes: Cholinephosphate cylidylyltransferase. Glyc- 
eroi-3-phosphate cytidylyltransferase. 
Number of members: 64 

[0382] [1] Medline: 10208837 CTP:Phosphocholine Cytidylyltransferase: Insights into Regulatory Mechanisms and 

5 Novel Functions. Clement JM, Kent C; Biochem Biophys Res Commun 1999;257:643-650. 

[0383] 108. (cNMP binding) Cyclic nucleotide-binding domain signatures and profile Proteins that bind cyclic nucle- 
otides (cAMP or cGMP) share a structural domainof about 120 residues [1-3]. The best studied of these proteins is 
theprokaryotic catabolite gene activator (also known as the cAMP receptorprotein) (gene crp) where such a domain 
is known to be composed of threealpha-helices and a distinctive eight-stranded, antiparallel beta-ban-elstructure. Such 

10 a domain is known to exist in the following proteins: - 

Prokaryotic catabolite gene activator protein (CAP). - cAMP- and cGMP-dependent protein kinases (cAPK and cGPK). 
Both types of kinases contains two tandem copies of the cyclic nucleotide-binding domain. The cAPK's are composed 
of two different subunits: a catalytic chain and a regulatory chain which contains both copies of the domain. The cGPK's 
are single chain enzymes that include the two copies of the domain in their N- tenninal section. The nucleotide specificity 

15 of cAPK and cGPK is due to an amino acid in the conserved region of beta-ban-el 7: a threonine that is invariant in 
cGPK is an alanine in most cAPK. - Vertebrate cyclic nucleotide-gated ion-channels. Two such cations channels have 
been fully characterized. One is found in rod cells where it plays a role in visual signal transduction. It specifically binds 
to cGMP leading to an opening of the channel and thereby causing a depolarization of rod photoreceptors. In olfactory 
epithelium a similar. cAMP-binding. channel plays a role in odorant signal transduction. There are six invariant amino 

20 acids in this domain, three of which are glycine residues that are thought to be essential for maintenance of the of the 
beta-ban-el. Two signature patterns for this domain have been developed. The first pattern is located within beta-ban-els 
2 and 3 and contains the first two conserved Gly. The second patlem is located within beta-barrels 6 and 7 and contains 
the third conserved Gly as well as the three other invariant residues.- 
First consensus pattern: [LIVM]-[VIC]-x(2)-G-[DENQTA]-x-[GACl-x(2)-[LIVMFY](4)-x(2)-G 

25 Second consensus pattern: [LIVMF]-G-E-x-[GAS]-[LIVM]-x(5. 1 1 )-R-[STAQ]-A-x-[LIVMA]-x- [STACV]- 

[ 1] Weber I.T. Shabb J.B.. Corbin J.D. Biochemistry 28:6122-6127(1989). 

[ 2] Kaupp U.B. Trends Neurosci. 14:150-157(1991). 

[ 3] Shabb J.B., Corbin J.D. J. Biol. Chem. 267:5723-5726(1992). 

30 

[0384] 109. (cadherin) 

Cadherins extracellular repeated domain signature 

Cadherins [1,2] are a family of animal glycoproteins responsible for calcium-dependent cell-cell adhesion. Cadherins 
preferentially interact with themselves in a hemophilic manner in connecting cells; thus acting as both receptor and 
35 ligand. A wide number of tissue-specific forms of cadherins are known: 

Epithelial (E-cadherin) (also known as uvomorulin or L-CAM) (CDH1). 
Neural (N-cadherin) (CDH2). 
Placental (P-cadherin) (CDH3). 
40 - Retinal (R-cadherin) (CDH4). 

- Vascular endothelial (VE-cadherin) (CDH5). 

- Kidney (K-cadherin) (CDH6). 

- Cadherin-8 (CDH8). 

- Osteoblast (OB-cadherin) (CDH11). 
45 - Brain (BR-cadherin) (CDH12). 

T-cadherin (truncated cadherin) (CDH13). 
Muscle (M-cadherin) (CDH14). 
Liver-intestine (Ll-cadherin). 
EP-cadherin. 

50 

[0385] Structurally, cadherins are built of the following domains: a signal sequence, followed by a propeptide of about 
130 residues, then an extracellular domain of around 600 residues, then a transmembrane region, and finally a C- 
terminal cytoplasmic domain of about 1 50 residues. The extracellular domain can be sub- divided into five parts: there 
are four repeats of about 1 1 0 residues followed by a region that contains four conserved cysteines. It is suggested that 
55 the calcium-binding region of cadherins is located in the extracellular repeats. 

[0386] Cadherins are evolutionary related to the desmogleins which are component of intercellular desmosome junc- 
tions involved in the interaction of plaque proteins: 
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Desmoglein 1 (desmosomal glycoprotein I). 
Desmoglein 2. 

Desmoglein 3 (Pemphigus vulgaris antigen). 

5 [0387] The Drosophila fat protein [3] is a huge protein of over 5000 amino acids that contains 34 cadherin-like repeats 
in its extracellular domain. 

[0388] The signature pattern that was developed for the repeated domain is located in it the C4erminal extremity 
which is its best conserved region. The pattern includes two conserved aspartic acid residues as well as two aspar- 
agines; these residues could be implicated in the binding of calcium. 
10 [0389] Consensus pattem[LIV]-x-[LIV]-x-D>x-N-D-[NH]-x-P Sequences known to belong to this class detected by the 
pattern ALL. Note this pattern is found in the first, second, and fourth copies of the repeated domain. In the third copy 
there is a deletion of one residue after the second conserved Asp. 

[ 1] Takeichi M. Annu. Rev. Biochem. 59:237-252(1990). 
15 I 2] Takeichi M. Trends Genet. 3:213-217(1987). 

[ 3] Mahoney P.A., Weber U., Onofrechuk P., Biessmann H., Bryant P.J., Goodman C.S. Cell 67:853-868(1991). 

[0390] 110. Calreticulin family signatures 

Calrettculin [1] (also known as calregulin. CRP55 or HACBP) is a high-capacitycalcium-binding protein which is present 

20 in most tissues and located at the periphery of the endoplasmic (ER) and the sarcoplamic reticulum (SR)membranes. 
It probably plays a role in the storage of calcium in the lumen ofthe ER and SR and it may well have other important 
functions. Structurally, calreticulin is a protein of about 400 amino acid residues consisting of three domains: a) An N- 
terminal, probably globular, domain of about 180 amino acid residues (N-domain); b) A central domain of about 70 
residues (P-domain) which contains three repeats of an acidic 17 amino acid motif. This region binds calcium with a 

25 low-capacity, but a high-affinity; c) A C-terminal domain rich in acidic residues and in lysine (C-domain). This region 
binds calcium with a high-capacity but a low-affinity. Calreticulin is evolutionary related to the following proteins: - 
Onchocerca volvulus antigen RAL-1. RAL-1 is highly similar to calreticulin, but possesses a C-terminal domain rich in 
lysine and arginine and lacks acidic residues and is therefore not expected to bind calcium in that region. - Calnexin 
[2]. A calcium-binding protein that interacts with newly synthesized glycoproteins in the endoplasmic reticulum. It seems 

30 to play a major role in the quality control apparatus of the ER by the retention of incorrectly folded proteins. - Calmegin 
[3] (or calnexin-T), a testis-specific calcium-binding protein highly similar to calnexin. Three signature patterns have 
been developed for this family of proteins. The first two patterns are based on conserved regions in the N-domain; the 
third pattem corresponds to positions 4 to 1 6 of the repeated motif in the P-domain. 
Consensus pattern: [KRHN]-x-[DEQN]-[DEQNK]-x(3)-C-G-G-IAGHFY]-[UVM]-[KNl-[LIVMFY](2)- 

35 Consensus pattern: [LIVM](2)-F-G-P-D-x-C-[AG]- 

Consensus pattem: [IV]-x-D-x-[DENSTl-x(2)-K-P-[DEH]-D-W-[DEN]- 

[ 1] Michalak M., Milner R.E., Burns K., Opas M. Biochem. J. 285:681-692(1992). 

[2] Bergeron J.J.M., Brenner M.B., Thomas D.Y.. Williams D.B. Trends Biochem. Sci. 19:124-128(1994). 
40 [ 3] Watanabe D., Yamada K., Nishina Y., Tajima Y., Koshimizu U.. Nagata A., Nishimune Y. J. Biol. Chem. 269: 

7744-7749(1994). 

[0391] 111 . Eukaryotic-type carbonic anhydrases signature (carb^anhydrase) 

Carbonic anhydrases (EC 4.2.1.1 ) (CA) [1,2,3,4] are zinc metalloenzymes which catalyze the reversible hydration of 
45 carbon dioxide. Eight enzymatic and evolutionary related forms of carbonic anhydrase are currently known to exist in 
vertebrates: three cytosolic isozymes (CA-I, CA-II and CA-III); two membrane-bound forms (CA-IV and CA-VII); a 
mitochondrial form (CA-V); a secreted salivary form (CA-VI); and a yet uncharacterized isozyme [5]. In the alga 
Chlamydomonas reinhardtii, two CA isozymes have been sequenced[6]. They are periplasmic glycoproteins evolu- 
tionary related to vertebrate CAs. Some bacteria, such as Neisseria gonorrhoeae [7] also have a eukaryotic-type CA. 
50 CAs contain a single zinc atom bound to three conserved histidine residues. As a signature for CAs, a pattem has 
been developed which includes one of these zinc-binding histidines. Protein D8 from Vaccinia and other poxviruses is 
related to CAs but has lost two of the zinc-binding histidines as well as many otherwise conserved residues. This is 
also true of the N-terminal extracellular domain of some receptor-type tyrosine-protein phosphatases (see 
< PDOC00323 >). 

55 Consensus pattem: S-E-[HNl-x-[LIVMl-x(4)-IFYH]-x(2)-E-[LIVMGA]-H-[LIVMFA](2) [The second H is a zinc ligand]- 
Note: most prokaryotic CA*s as well as plant chloroplast CA*s belong to another, evolutionary distinct family of proteins 
(see < PDOC00586 
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[ 1] Deutsch H.F. Int. J. Biochem. 19:101-113(1987). 
[2] Fernley R.T. Trends Biochem. Sci. 13:356-359(1988). 
[3] Tashian R.E. BioEssays 10:186-192(1989). 
[4] Edwards Y. Biochem. Soc. Trans. 18:171-175(1990). 
5 [ 5] Skaggs L.A., Bergenhem N.C.H., Venta RJ.. Tashian R.E. Gene 126:291-292(1993). 

[ 6] Fujiwara S., Fulcuzawa H., tachiki A., Miyachi S. Proc. Natl. Acad. Sci. U.S.A. 87:9779-9783(1990). 

[ 7] Huang S.. Xue Y., Sauer-Eriksson E.. Chirica L., Lindskog S..Jonsson B.H. 2.3.CQ.2 "J. Mol. Bio l. 283:301-310 

(1998). 

10 [0392] 112. Caseins alpha/beta signature 

Caseins [1] are the major protein constituent of milk. Caseins can be classified into two families; the first consists of 
the kappa-caseins, and the second groups the alpha-s1, alpha-s2, and beta-caseins. The alpha/beta caseins are a 
rapidly diverging family of proteins. However two regions are conserved: a cluster of phosphorylated serine residues 
and the signal sequence. The signature pattem has been developed for this family of proteins based upon the last 

15 eight residues of the signal sequence. 

Consensus pattem: C-L-[LVl-A-x-A-[LVF]-A- 

[1] Holt C, Sawyer L. Protein Eng. 2:251-259(1988). 

[0393] 113. Catalase signatures 

Catalase (EC 1.11.1.6) [1,2,3] is an enzyme, present in all aerobic cells.that decomposes hydrogen peroxide to mo- 
20 lecular oxygen and water. Its main function is to protect cells from the toxic effects of hydrogen peroxide. In eukaryotic 
organisms and in some prokaryotes catalase is a molecule composed of four identical subunits. Each of the subunits 
binds one protoheme IX group. A conserved tyrosine serves as the heme proximal side ligand. The region around this 
residue has been used as a first signature pattern; it also includes a conserved arginine that participates in heme- 
binding. A conserved histidine has been shown to be important for the catalytic mechanism of the enzyme. The region 
25 around this residue has been selected as a second signature pattern. - 

Consensus pattern: R-[LIVMFSTAN]-F-[GASTNP]-Y-x-D-[ASTl-[QEH] [Y is the proximal heme-binding ligand] 
Consensus pattem: [IF]-x-[RH]-x(4)-[EQ]-R-x(2)-H-x(2)-[GAS]-IGASTF]-[GAST] [H is an active site residue] 
Note: some prokaryotic catalases belong to the peroxidase family (see <PDOC00394 >). 

30 [ 1] Murthy M.R.N., Reid T.J. Ill, Sicignano A., Tanaka N.. Rossmann M.G. J. Mol. Biol. 152:465-499(1981). 

[ 2] Melik-Adamyan W.R., Barynin V.V., Vagin A.A.. Borisov V.V., Vainshtein B.K., Fita I., Murthy M.R.N.. Rossmann 
M.G. J. Mol. Biol. 188:63-72(1986). 

[ 3] von Ossowki !., Hausner G., Loewen P.C. J. Mol. Evol. 37:71-76(1993). 

35 [0394] 114. (chitin binding) Chitin recognition or binding domain signature 

A conserved domain of 43 amino acids is found in several plant and fungal proteins that have a common binding 
specificity for oligosaccharides of N-acetylglucosamine [1]. This domain may be involved in the recognition or binding 
of chitin subunits. It has been found in the proteins listed below. - A number of non-leguminous plant lectins. The best 
characterized of these lectins are the three highly homologous wheat germ agglutinins (WGA-1, 2 and 3). WGA is an 

40 N-acetylglucosamine/N-acetylneuraminic acid binding lectin which structurally consists of a fourfold repetition of the 
43 amino acid domain. The same type of structure is found in a barley root-specific lectin as well as a rice lectin. - 
Plants endochitinases (EC 3.2.1.14) from class lA (see <PDOC00620 >). Endochitinases are enzymes that catalyze 
the hydrolysis of the beta-1 .4 linkages of N-acetyl glucosamine polymers of chitin. Plant chitinases function as a defense 
against chitin containing fungal pathogens. Class lA chitinases generally contain one copy of the chitin-binding domain 

45 at their N-terminal extremity. An exception is agglutinin/chitinase [2] from the stinging nettle Urtica dioica which contains 
two copies of the domain. - Hevein [5], a wound-induced protein found in the latex of rubber trees. - Wini and win2, 
two wound-induced proteins from potato. - Kluyveromyces lactis killer toxin alpha subunit [3]. The toxin encoded by 
the linear plasmid pGKLI is composed of three subunits: alpha, beta, and gamma. The gamma subunit harbors toxin 
activity and inhibits growth of sensitive yeast strains in the G1 phase of the cell cycle; the alpha subunit, which is 

50 proteolytically processed from a larger precursorthat also contains the beta subunit, is a chitlnase (see <PDOC00839>). 
In chitinases. as well as in the potato wound-induced proteins, the 43-residuedomain directly follows the signal se- 
quence and is therefore at the N-terminal of the mature protein; in the killer toxin alpha subunit it is located in the central 
section of the protein. The domain contains eight conserved cysteine residues which have all been shown, in WGA. 
to be involved in disulfide bonds. The topological annangement of the four disulfide bonds is shown in the following 

55 figure: + ++ — | +||||| . . 

xxCgxxxxxxxCxxxxCCsxxgxCgxxxxxCxxxCxxxxC |*— |*"— *******||||+_+ + +«C: conserved cysteine in- 
volved in a disulfide bond.'*': position of the pattern. 
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Consensus pattern: C-x(4,5)-C-C-S-x(2)-G-x-C-G-x(4)-[FYW]-C [The five C*s are involved in disulfide bonds] 

[ 1] Wright H.T.. Sandrasegaram G.. Wright C.S. J. Mol. Evol. 33:283-294(1991). 
[2] Lemer D.R., Raikhel N.V. J. Biol. Chem. 267:11085-11091(1992). 
5 [ 3] Butler A.R.. ODonnel R.W.. Martin V.J.. Gooday G.W., Stark M.J.R. Eur. J. Biochem. 199:483-488(1991). 

[0395] 115. (Chitinase 1) Chitinases family 19 signatures 

Chitinases (EC 3.2.1.14) [1] are enzymes that catalyze the hydrolysis of thebeta-1 ,4-N-acetyl-D-glucosamine linkages 
in chitin polymers. From the viewpoint of sequence similarity chitinases belong to either family 18 or 19 in the classi- 

10 fication of glycosyl hydrolases [2, El]. Chitinases of family 19(also known as classes lA or i and IB or 11) are enzymes 
from plants that function in the defense against fungal and insect pathogens by destroying their chitin-containing cell 
wall. Class lA/l and IB/II enzymes differ in the presence (lA/l) or absence (IB/I I) of a N-terminal chitin-binding domain 
(seethe relevant entry < PDOC00025 >). The catalytic domain of these enzymes consist of about 220 to 230 amino acid 
residues. Two highly conserved regions have been selected as signature patterns, the first one is located in the N- 

15 terminal section and contains one of the six cysteines which are conserved in most, if not all, of these chitinases and 
which is probably involved in a disulfide bond. 

Consensus pattern: C-x(4,5)-F-Y-[ST|-x(3)-[FYl-[LIVMF]-x-A-x(3)-[YF]-x(2)-F- [GSA] 
Consensus pattem: [LIVM]-[GSA]-F-x-[STAG](2)-[LIVMFY]-W-[FY]-W-[LIVM] 

20 1 1] Flach J., Pilot R-E.. Jolles P. Experientia 48:701-716(1992). 

[2] Henrissat B. Biochem. J. 280:309-316(1991). 

[0396] 1 1 6. chloroa_b-bind 

Chlorophyll A-B binding proteins. Number of members: 211 
25 [0397] 117. chromo 

The 'chromo* (CHRromatin Organization Modifier) domain [1 to 4] is a conserved region of about 60 amino acids which 
was originally found in Drosophila modifiers of variegation, which are proteins that modify the structure of chromatin 
to the condensed morphology of heterochromatin, a cytologically visible condition where gene expression is repressed. 
In protein Polycomb, the chromo domain has been shown to be important for chromatin targeting. Proteins that contains 
30 a chromo domain seem to fall into three classes: 

a) Proteins which have a N-terminal chromo domain followed by a region which is related to but distinct from the 
chromo domain and which has been termed [3] the 'chromo shadow' domain. 

b) Proteins with a single chromo domain. 

35 c) Proteins with paired tandem chromo domains. 

[0398] Currently, this domain has been found in the following proteins: 
[0399] Class A. 

40 - Drosophila heterochromatin protein Su(var)205 (HP1). 
Human heterochromatin protein HP1 alpha. 
Mammalian modifier 1 and modifier 2. 

Fission yeast swi6, a protein involved in the repression of the silent mating-type lod mat2 and mat3. 

45 [0400] Class B. 

Drosophila protein Polycomb (Pc). 
Mammalian modifier 3, a homolog of Pc. 

Drosophila protein Su(var)3-9, a suppressor of position-effect variegation. 
50 . Human Mi-2 autoantigen, characterisitic of dermatomyosis. 

Fungal retrotranposon polyproteins: 'skippy' from Fusarium oxysporum. 'grasshopper* and 'MAGGY* from Mag- 

naporthe grisea and CfT-1 from Cladosporium fulvum. 

Fission yeast hypothetical protein SpAC18G6.02c. 

Caenorhabditis elegans hypothetical protein C29H12.5 
55 . Caenorhabditis elegans hypothetical protein ZK1 236.2. 

Caenorhabditis elegans hypothetical protein T09A5.8. 

[0401] Class C. 
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Mammalian DNA-binding/helicase proteins CHD-1 to CHD-4. 

- Yeast protein CHD1. 

[0402] The signature pattern for this domain cx)n-esponds to its best conserved section, which is located in its central 
5 part. 

- Consensus pattem: [FYL]-x-[LIVMCHKR]-W-x4GDNRHFYWLMEl-x{5,6)-[STl-^^ 

[ 1] Paro R. Trends Genet. 6:416-421(1990). 
10 [ 2] Singh P.B.. Miller J.R., Pearce J.. Kothary R., Burton R.D.. Paro R.. James T.C., Gaunt S.J. Nucleic Acids Res. 

19:789-794(1991). 

[ 3] Aasland R.. Stewart A.F. Nucleic Acids Res. 23:3168-3173(1995). 

[4] Koonin E.V., Zhou S., Lucchesis J.C. Nucleic Acids Res. 23:4229-4233(1995). 

15 [0403] 118. citrate_synt 

Citrate synthase (EC 4.1.3.7) (CS) is the tricarboxylic acid cycle enzyme that catalyzes the synthesis of citrate from 
oxaloacetate and acetyl-CoA in an aldol condensation. CS can directly forni a carbon-carbon bond in the absence of 
metal ion cofactors. 

[0404] In prokaryotes, citrate synthase is composed of six identical subunits. In eukaryotes, there are two isozymes 
20 of citrate synthase: one is found in the mitochondrial matrix, the second is cytoplasmic. Both seem to be dimers of 
identical chains. 

[04051 There are a number of regions of sequence similarity between prokaryotic and eukaryotic citrate synthases. 
One of the best conserved contains a histidine which is one of three residues shown [1] to be involved in the catalytic 
mechanism of the vertebrate mitochondrial enzyme. This region has been used as a signature pattern. 

25 

- Consensus pattern: G-[FYA]-[GA]-H-x-[IV]-x(1 .2)-[RKT|-x(2)-D-[PS]-R [H is an active site residue] 

[0406] [1] Karpusas M.. Branchaud B., Remington S.J. Biochemistry 29:2213-2219(1990). 
[0407] 119. clpA_B 
30 Chaperonin cIpA/B 

CAUTION! This family is a subfamily of the AAA superfamily The threshold has been set very high to stop overlaps 
with the AAA superfamily This entry will be subsumed by AAA in the future. 
Number of members: 39 

[0408] A number of ATP-binding proteins that are are thought to protect cells from extreme stress by controlling the 
35 aggregation of denaturation of vital cellular structures have been shown [1 ,2] to be evolutionary related. These proteins 
are listed below. 

- Escherichia coli clpA, which acts as the regulatory subunit of the ATP-dependent protease dp. 
Rhodopseudomonas blastica clpA homolog. 

40 - Escherichia coli heat shock protein cIpB and homologs in other bacteria. 
Bacillus subtilis protein mecB. 

- Yeast heat shock protein 104 (gene HSP104). which is vital for tolerance to heat, ethanol and other stresses. 
Neurospora heat shock protein hsp98. 

- Yeast mitochondrial heat shock protein 78 (gene HSP78) [3]. 

45 - CD4A and CD4b, two highly related tomato proteins that seem to be located in the chloroplast. 
Trypanosoma brucei protein dp. 
Porphyra purpurea chloroplast encoded clpC. 

[0409] The size of these proteins range from 84 Kd (clpA) to slightly more than 100 Kd {HSP104). They all share 
50 two conserved regions of about 200 amino acids that each contains an ATP-binding site. In addition to the ATP-binding 
A and B motifs there are many parts in these two domains that are also conserved. Two of these regions have been 
selected as signature pattems. The first signature is located in the first domain, some ten residues to the C-temninal 
of the ATP-binding B motif. The second pattern is located in the second domain in-between the ATP-binding A and B 
motifs. 

55 

- Consensus pattern: D-[Al]-[SGA3-N-[LlVMF](2)-K-[PT|-x-L-x(2)-G 

- Consensus pattern: R-[LIVMFY]-D-x-S-E-[LIVMFY]-x-E-[KRQl-x-[STA]-x-[STAHKRl-[LlVM3-x-G-[STAl 
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[ 1] Gottesman S., Squires C. Pichersky E., Camngton M., Hobbs M., Mattick J.S., Dalrymple B., Kuramitsu H., 
Shiroza T.. Foster T. Clark W.P., Ross B.. Squires C.L.. Maurizi M.R. Proc. Natl. Acad. Sci. U.S.A. 87:3513-3517 
(1990). 

[2] Parsell D.A., Sanchez Y.. Stitzel J.D.. Lindquist S. Nature 353:270-273(1991). 
5 1 3] Leonhardt S.A.. Fearon K.. Danese RN.. Mason T.L. Mol. Cell. BioL 13:6304-6313(1993). 

[0410] 120. cofilin_ADF 
Cofilln/tropomyosin-type actin-btnding proteins 
111 

10 Medline: 97290449 

Structure determination of yeast cofilin. 

Fedorov AA, Lappalainen P, Fedorov EV, Drubin DG, Almo SC; 
Nat Struct Biol 1997;4:366-369. 
[2] 

15 Medline: 97290450 

Crystal structure of the actin-binding protein actophorin from Acanthamoeba. 
Leonard SA, Gittis AG, Petrella EC, Pollard TD. Lattman EE; 
Nat Struct Biol 1997;4:369-373, 
[3] 

20 Medline: 97420794 

F-actin and G-actin binding are uncoupled by mutation of conserved tyrosine residues in maize actin depolymerizing 
factor. 

Jiang CJ. Weeds AG. Khan S, Hussey PJ; 
Proc Natl Acad Sci U S A 1997;94:9973-9978, 
25 [4] 

Medline: 97357155 

Cofilin promotes rapid actin filament turnover in vivo. 
Lappalainen P, Drubin DG; 
Nature 1997;388:78-82. 
30 Severs actin filaments and binds to actin monomers. 
Number of members: 44 

[0411] Actin-depolymerizing proteins sever actin filaments (F-actin) and/or bind to actin monomers, or G-actin. thus 
preventing actin-polymerization by sequestering the monomers. The following proteins are evolutionary related and 
belong to a family of low molecular weight (137 to 166 residues) actin-depolymerizing proteins [1.2.3,4]: 

35 

Cofilin from vertebrates, slime mold and yeast. Cofilin binds to F-actin and acts as a pH-dependent actin-depo- 
lymerizing protein. 

Destrin from vertebrates. Destrin binds to G-actin in a pH-independent manner and prevents polymerization. 
Caenorhabditis elegans unc-60. 
40 - Acanthamoeba castellanii actophorin. 

Plants actin depolymerizing factor (ADF). 

[0412] The most conserved region of these proteins is a twenty amino-acid segment that ends some 30 residues 
from their C-terminal extremity. This segment has been shown [5] to be important for actin-binding. 

45 

- Consensus pattern: P-[DE]-x-[SA]-x-ILIVMT]-[KR]-x-[KR]-M-[LIVM]-[YA]-[STA](3)-x(3)-[LIVMF]-[KR] 

[ 1] Hawkins M.. Pope B.. Maclver S.K., Weeds A.G. Biochemistry 32:9985-9993(1993). 
[ 2] lida K., Moriyama K.. Matsumoto S., Kawasaki H.. Nishida E.. Yahara I. Gene 124:115-120(1993). 
so [ 3] Quirk S.. Maclver S.K., Ampe C. Doberstein S.K., Kaiser D.A., van Damme J.. Vandekerckhove J.. Pollard T. 

D. Biochemistry 32:8525-8533(1993). 

[4] McKim K.S.. Matheson C. Marra M.A., Wakarchuk M.F.. Baillie D.L. Mol. Gen. Genet. 242:346-357(1994). 
( 5] Moriyama K.. Yonezawa N., Sakai H.. Yahara I.. Nishida E. J. Biol. Chem. 267:7240-7244(1992). 

55 [0413] 121. (Complex 24kd) Respiratory-chain NADH dehydrogenase 24 Kd subunit signature Respiratory-chain 
NADH dehydrogenase (EC 1 .6.5.3) [1 .2] (also known as complexl or NADH-ubiquinone oxidoreductase) is an oligo- 
meric enzymatic complex located in the Inner mitochondrial membrane which also seems to exist inthe chloroplast and 
in cyanobacteria (as a NADH-plastoquinone oxidoreductase).Among the 25 to 30 polypeptide subunits of this bioen- 
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ergetic enzyme complex there is one with a molecular weight of 24 Kd (in mammals), which is a component of the iron- 
sulfur (IP) fragment of the enzyme. It seems to bind a2Fe-2S iron-sulfur cluster. The 24 Kd subunit is nuclear encoded, 
as aprecursor form with a transit peptide in mammals, and in Neurospora crassa.The 24 Kd suDunit is nighiy simiiar 
to [3.4]: - Subunit E of Escherichia coll NADH-ubiquinone oxidoreductase (gene nuoE). - Subunit NQ02 of Paracoccus 
5 denitriftcans NADH-ubiquinone oxidoreductase. A highly conserved region, located in the central section of this subunit 
containing two conserved cysteines that are probably involved In the binding of the 2Fe-2S center has been selected 
as a signature pattern. 

- Consensus pattern: D-x(2)-F-[ST]-x(5)-C-L-G-x-C-x(2) [GA]-P [The two C's are putative 2Fe-2S ligands] 

10 

[ 1] Ragan C.I. Curr. Top. Bioenerg. 15:1-36(1987). 

[ 2] Weiss H., Friedrich T. Hofhaus G., Preis D. Eur. J. Biochem. 197:563-576(1991). 

[ 3] Fearnley I.M., Walker J.E. Biochlm. Biophys. Ada 1140:105-134(1992). 

[ 4] Weidner U.. Geier S., Rock A., Friedrich T, Leif H.. Weiss H. J. Mol. Biol. 233:109-122(1993). 

15 

[0414] 122. copper-bind 

Copper binding proteins, plastocyanin/azurin family 
Number of members: 70 

[0415] Blue or 'type-V copper proteins are small proteins which bind a single copper atom and which are character- 
20 ized by an intense electronic absorption band near 600 nm [1 ,2]. The most well known members of this class of proteins 
are the plant chloroplastic plastocyanins, which exchange electrons with cytochrome c6. and the distantly related bac- 
terial azurins. which exchange electrons with cytochrome c551. This family of proteins also includes all the proteins 
listed below (references are only provided for recently determined sequences). 

25 - Amicyanin from bacteria such as Methylobacterium extorquens or Thiobacillus versutus that can grow on methyl- 
amine. Amicyanin appears to be an electron receptor for methylamine dehydrogenase. 

- Auracyanins A and B from Chloroflexus aurantiacus [3]. These proteins can donate electrons to cytochrome c-554. 
Blue copper protein from Alcaligenes faecalis. 

Cupredoxin (CPC) from cucumber peelings [4]. 
30 - Cusacyanin (basic blue protein; plantacyanin, CBP) from cucumber. 

Halocyanin from Natrobacterium pharaonis [5], a membrane associated copper-binding protein. 
Pseudoazurin from Pseudomonas. 

Rusticyanin from Thiobacillus ferrooxidans. Rusticyanin is an electron carrier from cytochrome c-552 to the a-type 
oxidase [6]. 

35 - Stellacyanin from the Japanese lacquer tree. 
Umecyanin from horseradish roots. 

Allergen Ra3 from ragweed. This pollen protein is evolutionary related to the above proteins, but seems to have 
lost the ability to bind copper. 

40 

[041 6] Although there is an appreciable amount of divergence in the sequence of all these proteins, the copper ligand 
sites are conserved and a pattem which includes two of the ligands (a cysteine and a histidine) has been developed. 

- Consensus pattem: [GA]-x(0,2)-[YSA]-x(0.1)-[VFY]-x-C-x(1,2)-[PG]-x(0.1)-H-x{2.4)-[MQ] [C and H are copper lig- 
45 ands] 

[ 1] Garret TRJ.. Clingeleffer D.J.. Guss J.M., Rogers S.J.. Freeman H.C. J. Biol. Chem. 259:2822-2825(1984). 
[2] Ryden L.G., Hunt L.T J. Mol. Evol. 36:41-66(1993). 

[ 3] McManus J.D., Brune D.C., Han J., Sanders-Loehr J., Meyer T.E.. Cusanovich M.A., Tollin G.. Biankenship R. 
50 E. J. Biol. Chem. 267:6531-6540(1992). 

[ 4] Mann K., Schaefer W., Thoenes U., Messerschmidt A., Mehrabian 2.. Nalbandyan R. FEBS Lett. 314:220-223 
(1992). 

[ 5] Mattar S., Scharf B., Kent S.B.H., Rodewald K., Oesterhelt D., Engelhard M. J. Biol. Chem. 269:14939-14945 
(1994). 

55 [ 6] Yano T, Fukumori Y.. Yamanaka T FEBS Lett 288:159-162(1991). 

[0417] 123. Chaperonins cpnIO signature 

Chaperonins [1,2] are proteins involved in the folding of proteins or the assembly of oligomeric protein complexes. 
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They seem to assist other polypeptides in maintaining or assuming conformations which permit their correct assembly 
into oltgomeric structures. They are found in abundance in prokaryotes. chloropiasts and mitochondria. Chaperonins 
form oligomeric complexes and are composed of two different types of subunits: a 60 Kd protein, known as cpn60 
(groEL in bacteria) and a 10 Kd protein, known ascpnIO (groES in bacteria).The cpn10 protein binds to cpn60 in the 
5 presence of MgATP and suppresses the ATPase activity of the latter. Cpn10 is a protein of about 100 amino acid 
residues whose sequence is well conserved in bacteria, vertebrate mitochondriaand plants chloroplast [3.4]. Cpn10 
assembles as an heptamer that forms a dome[5]. As a signature pattem for cpnIO, a region located in the N-terminal 
section of the protein was selected. 

Consensus pattern: [LIVMFY]-x-P-[ILT]-x-[DEN]-[KR]-[LIVMFA](3)-[KREQ]-x(8.9HSG]-x-[LIVMFY](3)- 
10 Note: this pattem is found twice in the plant chloroplast protein which consist of the tandem repeat of a cpnIO domain 

[ 1] Ellis R.J.. van der Vies S.M. Annu. Rev. Biochem. 60:321-347(1991). 
[ 2] Zeilsta-Ryalls J., Fayet O.. Georgopoulos C. Annu. Rev. Microbiol. 45:301-325(1991). 
I 3] Hartman D.J., Hoogenraad N.J., Condron R., Hoj P.B. Proc. Natl. Acad. Sci. U.S.A. 89:3394-3398(1992). 
15 [ 4] Bertsch U., Soil J.. Seetharam R., Viitanen PV. Proc. Natl. Acad. Sci. U.S.A. 89:8696-8700(1992). 

[ 5] Hunt J.F., Weaver AJ.. Landry S.J., Gierasch L., Delsenhofer J. Nature 379:37-45(1996). 

[0418] 124. Chaperonins cpn60 signature (cpn60.TCP1) 

Chaperonins [1 .2] are proteins involved in the folding of proteins or the assembly of oligomeric protein complexes. 

20 Their role seems to be to assist other polypeptides to maintain or assume conformations which permit their correct 
assembly into oligomeric structures. They are found in abundance in prokaryotes. chloropiasts and mitochondria. Chap- 
eronins form oligomeric complexes and are composed of two different types of subunits: a 60 Kd protein, known as 
cpn60 (groEL in bacteria) and a 10 Kd protein, known as cpnlO (groES in bacteria).The cpn60 protein shows weak 
ATPase activity and is a highly conserved protein of about 550 to 580 amino acid residues which has been described 

25 by different names in different species: - Escherichia coli groEL protein, which is essential for the growth of the bacteria 
and the assembly of several bacteriophages. - Cyanobacterial groEL analogues. - Mycobacterium tuberculosis and 
leprae 65 Kd antigen. Coxiella bumetti heat shock protein B (gene htpB). Rickettsia tsutsugamushi major antigen 58. 
and Chlamydial 57 Kd hypersensitivity antigen (gene hypB). - Chloroplast RuBisCO subunit binding-protein alpha and 
beta chains, which bind ribulose bisphosphate carboxylase small and large subunits and are implicated in the assembly 

30 of the enzyme oligomer. - Mammalian mitochondrial matrix protein PI (mitonin or P60). - Yeast HSP60 protein, a 
mitochondrial assembly factor. As a signature pattern for these proteins, a rather well-conserved region of twelve 
residues, located in the last third of the cpn60sequence was chosen. 
Consensus pattern: A-(AS]-x-[DEQ]-E-x(4)-G-G-[GA]- 

35 1 1] Ellis R.J.. van der Vies S.M. Annu. Rev. Biochem. 60:321-347(1991). 

[ 2] Zeilsta-Ryalls J., Fayet O., Georgopoulos C. Annu. Rev. Microbiol. 45:301-325(1991). 

[0419] Chaperonins TCP-1 signatures (cpn60_TCP1) 

The TCP-1 protein [1,2] (Tailless Complex Polypeptide 1) was first identified in mice where it is especially abundant in 
40 testis but present in all cell types. It has since been found and characterized in many other mammalian species, in 
Drosophila and in yeast. TCP-1 is a highly conserved protein of about 60 Kd (556 to 560 residues) which participates 
in a hetero-oligomeric900 Kd double-torus shaped particle [3] with 6 to 8 other different subunits. These subunits. the 
chaperonin containing TCP-1 (CCT) subunit beta, gamma.delta. epsilon, zeta and eta are evolutionary related to TCP- 
1 itself [4, 5]. The CCT is known to act as a molecular chaperone for tubulin, actin and probably some other proteins. 
45 [0420] The CCT subunits are highly related to archebacterial counterparts: - TF55 and TF56 [6], a molecular chap- 
erone from Sulfolobus shibatae. TF55 has ATPase activity, is known to bind unfolded polypeptides and forms a oligo- 
meric complex of two stacked nine-membered rings. - Thermosome [7], from Thermoplasma acidophilum. The ther- 
mosome is composed of two subunits (alpha and beta) and also seems to be a chaperone with ATPase activity. It 
forms an oligomeric complex of eight-membered rings. The TCP-1 family of proteins are weakly, but significantly [8], 
50 related to thecpn60/groEL chaperonin family (see <PDOC00268 >).As signature pattems of this family of chaperonins. 
three conserved regions located in the N-terminal domain were chosen. 
Consensus pattem: [REEL]-[ST]-x-[LMFY]-G-P-x-(GSA]-x-x-K-[LIVMF](2)- 

Consensus pattem: [LlVM]-rS]-[NK]-D-[GA].[AVNHK]-|TAVl-[LIVMJ(2)-x(2)-[LIVM]-x-[LlVM]-x-[SNH]-[PQH]- 
Consensus pattem: Q-[DEK]-x-x-[LIVMGTA]-[GA]-D-G-T- 

55 

[ 1] Ellis J. Nature 358:191-192(1992). 

[ 2] Nelson R.J., Craig E.A. Cun^. Biol. 2:487-489(1992). 

[ 3] Lewis V.A.. Hynes G.M., Zheng D.. Saibil H.. Willison K.R. Nature 358:249-252(1992). 
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[ 4] Kubota H., Hynes G.. Came A., Ashworth A., Willison K.R. Curr. Biol. 4:89-99(1994) 

f 51 Kim S., Willison K.R.. Norwich A.L. Trends Biochem. Scl 20:543-548(1994). 

[ 6] Trent J.D., Nimmesgem E.. Wall J.S.. HartI F.U., Honwich A.L. Nature 354:4yu-4y;3(i99i). 

1 7] WaldmannT. Lupas A.. Kellermann J., Peters J.. Baumeister W. Biol. Chem. Hoppe-Seyler 376:1 19-1 26(1 995). 

[ 8] Hemmingsen S.M. Nature 357:650-650(1992). 

[0421] 1 25. cyclin (Cyclins) 

The cyclins include an internal duplication, which is related to that found in TFIIB and the RB protein. 
[1] 

Medline: 94203808 

Evidence for a protein domain superfamily shared by the cyclins, 
TFIIB and RB/p107. 
Gibson TJ, Thompson JD, Blocker A, Kouzarides T; 
Nucleic Acids Res 1994;22:946-952. 
[2] 

Medline: 96164440 
The crystal structure of cyclin A 
Brown NR. Noble MEM, Endicott JA, Garman EF, Wakatsuki S. 
Mitchell E, Rasmussen B, HuntT, Johnson LN; 
Stnjcture. 1995;3:1235-1247. 
Complex of cyclin and cyclin dependant kinase. 
[3] 

Medline: 96313126 

Structural basis of cyclin-dependant kinase activation by phosphorylation. 
Russo AA, Jeffrey PD. Pavletich NP; 
Nat Struct Biol. 1996;3:696-700. 
Cyclins regulate cyclin dependant kinases (CDKs). ^ ^. ^ . 

The most divergent prosite members have been included. Swiss:P22674 the Uracil-DNA glycosylase 2 is the highest 
noise and may be related but has not been included. 

Number of members: 189 .... 
[0422] Cyclins [1,2,3] are eukaryotic proteins which play an active role in controlling nuclear cell division cycles. 
Cyclins, together with the p34 (cdc2) or cdk2 kinases, form the Maturation Promoting Factor (MPF). There are two 
main groups of cyclins: 

- G2/M cyclins, essential for the control of the cell cycle at the G2/M (mitosis) transition. G2/M cyclins accumulate 
steadily during G2 and are abruptly destroyed as cells exit from mitosis (at the end of the M-phase). 

- G1/S cyclins, essential for the control of the cell cycle at the G1/S (start) transition. 

[0423] In most species, there are multiple forms of G1 and G2 cyclins. For example, in vertebrates, there are two 

G2 cyclins, A and B, and at least three G1 cyclins, C, D, and E. 

[0424] A cyclin homolog has also been found in herpesvirus saimiri [4]. 

[0425] The best conserved region is in the central part of the cyclins* sequences, known as the 'cyclin-box'. From 
this, a 32 residue pattern has been derived. 

- Consensus pattern: R-x(2)-[LIVMSA]-x(2)-[FYWS]-[LIVM]-x(8)-[LIVMFC]-x(4)-[LIVMFYA]-x(2)-[STAGC]-[LIVM- 
FYQJ-x-[LlVMFYC]-[LlVMFY]-D-[RKH]-[LIVMFYWl 

[ 1] Nurse P. Nature 344:503-508(1990). 

[2] Norbury C, Nurse P. Cun-. Biol. 1:23-24(1991). 

[ 3] Lew D.J.. Reed S.I. Trends Cell Biol. 2:77-81(1992). 

[4] Nicholas J., Cameron K.R., Honess R.W. Nature 355:362-365(1992). 

[0426] 126. Cystatin domain 

This is a very diverse family. Attempts to define separate subfamilies have failed. Typically, either the N-terminal or C- 
terminal end is very divergent. But splitting into two domains would make very short families. Cathelicidins are related 
to this family but have not been included. Number of members: 147 

[0427] Inhibitors of cysteine proteases [1 ,2,3], which are found in the tissues and body fluids of animals, in the larva 
of the worm Onchocerca volvulus [4], as well as in plants, can be grouped into three distinct but related families: 
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Type 1 cystatins (or stefins), molecules of about 100 amino acid residues with neither disulfide bonds nor cart}o- 
hydrate groups. 

Type 2 cystatins, molecules of about 115 amino acid residues which contain one or two disulfide loops near their 
C-terminus. 

5 - Kininogens. which are multifunctional plasma glycoproteins. 

[0428] They are the precursor of the active peptide bradykinin and play a role in blood coagulation by helping to 
position optimally prekallikrein and factor XI next to factor XI I. They are also inhibitors of cysteine proteases. Structurally 
kininogens are made of three contiguous type-2 cystatin domains, followed by an additional domain (of variable length) 
10 which contains the sequence of bradykinin. The first of the three cystatin domains seems to have lost its inhibitory 
activity. 

[0429] In all these inhibitors, there is a conserved region of five residues which has been proposed to be important 
for the binding to the cysteine proteases. The consensus pattern starts one residue before this conserved region. 

15 - Consensus pattem: [GSTEQKRVJ-Q- [LIVT]-[VAFHSAGQ]-G-x-[LIVMNK]-x(2)-[LIVMFY]-x-[LIVMFYA]-[DEN- 
QKRHSIV] 

[1] Barrett A.J. Trends Biochem. Sci. 12:193-196(1987). 
[2] Rawlings N.D., Ban^ett A.J. J. Mol. Evol. 30:60-71(1990). 
20 [3] Turk v.. Bode W. FEBS Lett. 285:213-219(1991). 

[4] Lustigman S., Brotman B., Huima T, Prince A.M. Mol. Biochem. Parasitol. 45:65-76(1991). 

[0430] 127. cytochrome_c (Cytochrome c) 
The Pfam entry does not include all prosite members. 
25 The cytochrome 556 and cytochrome c* families are not included. 
Number of members: 259 

[0431] In proteins belonging to cytochrome c family [1]. the heme group is covalently attached by thioether bonds to 
two conserved cysteine residues. The consensus sequence for this site is Cys-X-X-Cys-His and the histidine residue 
is one of the two axial ligands of the heme iron. This arrangement is shared by all proteins known to belong to cyto- 
30 chrome c family, which presently includes cytochromes c, c', c1 to c6, c550 to c556, cc3/Hmc, cytochrome f and reaction 
center cytochrome c. 

- Consensus pattem: C-{CPWHFHCPWR}-C-H-{CFYW} 

35 [0432] [ 1] Mathews RS. Prog. Biophys. Mol. Biol. 45:1-56(1985). 

[0433] 128. (DAGKa) Diacylglycerol kinase accessory domain (presumed) 

[0434] Diacylglycerol (DAG) is a second messenger that acts as a protein kinase C activator. This domain is assumed 
to be an accessory domain: its function is unknown. 

[0435] [1] Sakane F, Yamada K, Kanoh H. Yokoyama C, Tanabe T, Nature 1990;344:345-348.[2] Sakane F, imai S, 
40 Kai M. Wada I, Kanoh H. J Biol Chem 1996;271 :8394-8401 .[3] Schaap D, de Widt J, van der Wal J, Vandekerckhove 
J, van. Damme J. Gussow D, Ploegh HL, van Blitterswijk WJ, van der, Bend RL, FEBS Lett 1990;275:151-158. [4] 
Kanoh H, Yamada K, Sakane F, Trends Biochem Sci 1990;15:47-50. 
[0436] 129. (DAGKc) Diacylglycerol kinase catalytic domain (presumed) 

[0437] Diacylglycerol (DAG) is a second messenger that acts as a protein kinase C activator. The catalytic domain 

45 is assumed from the finding of bacterial homologues. 

[0438] [1] Sakane F, Yamada K, Kanoh H, Yokoyama C, Tanabe T, Nature 1990;344:345-348. [2] Sakane F, Imai S, 
Kai M, Wada I, Kanoh H, J Biol Chem 1996;271:8394-8401. [3] Schaap D, de Widt J. van der Wal J, Vandekerckhove 
J. van. Damme J, Gussow D, Ploegh HL, van Blitterswijk WJ, van der. Bend RL, FEBS Lett 1990;275:151-158. [4] 
Kanoh H, Yamada K, Sakane F, Trends Biochem Sci 1990;15:47-50. 

50 [0439] 130. D-amino acid oxidases signature(DAO) 

[0440] D-amino acid oxidase (EC 1.4.3.3 ) (DAMOX or DAO) is an FAD flavoenzyme that catalyzes the oxidation of 
neutral and basic D-amino acids into their corresponding keto acids. DAOs have been characterized and sequenced 
in fungi and vertebrates where they are known to be located in the peroxisomes. D-aspartate oxidase (EC 1.4.3.1 ) 
(DASOX) [1] is an enzyme, structurally related to DAO, which catalyzes the same reaction but is active only toward 

55 dicarboxylic D-amino acids. In DAO, a conserved histidine has been shown [2] to be important for the enzyme's catalytic 
activity. The conserved region around this residue has been developed as a signature pattem for these enzymes. 
[0441] Consensus pattem: [LIVM](2)-H-[NHA]-Y-G-x-[GSA](2)-x-G-x(5)-G-x-A [H is a probable active site residue]o- 
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[ 1] Negri A., Ceciliani F., Tedeschi G., SimonicT.. Ronchi S. J. Biol. Chem. 267:11865-11871(1992). 

[ 21 Miyano M.. Fukui K.. Watanabe P., Takahashi S., Tada M.. Kanashiro M.. Miyake Y. J. Biochem. 109:171-177 

(1991). 

[0442] 131. DEAD and DEAH box families ATP-dependent helicases signatures 

A number of eukaryotic and prokaryotic proteins have been characterized [1.2,3] on the basis of their structural simi- 
larity They all seem to be involved in ATP-dependent, nucleic-acid unwinding. Proteins cun^ently known to belong to 
this family are: - Initiation factor elF-4A. Found in eukaryotes, this protein is a subunit of a high molecular weight 
complex involved in 5'cap recognition and the binding of mRNA to ribosomes. It is an ATP-dependent RNA-helicase. 

- PRP5 and PRP28. These yeast proteins are involved in various ATP-requiring steps of the pre-mRN A splicing process. 

- PI10. a mouse protein expressed specifically during spemnatogenesis. - An3, a Xenopus putative RNA helicase, 
closely related to P1 10. - SPP81/DED1 and DBP1. two yeast proteins probably involved in pre-mRNA splicing and 
related to PI1 0. - Caenorhabditis elegans helicase glh-1 . - MSS1 1 6. a yeast protein required for mitochondrial splicing. 

- SPB4, a yeast protein involved in the maturation of 25S ribosomal RNA. - p68. a human nuclear antigen. p68 has 
ATPase and DNA-helicase activities in vitro. It is involved in cell growth and division. - Rm62 (p62), a Drosophila 
putative RNA helicase related to p68. - DBP2, a yeast protein related to p68. - DHH1. a yeast protein. - DRS1, a yeast 
protein involved in ribosome assembly - MAK5, a yeast protein involved in maintenance of dsRNA killer plasmid. - 
ROK1 a yeast protein. - ste13, a fission yeast protein. - Vasa, a Drosophila protein important for oocyte fonnation and 
specifi'cation of embryonic posterior structures. - Me31 B. a Drosophila matemally expressed protein of unknown func- 
tion - dbpA. an Escherichia coli putative RNA helicase. - deaD, an Escherichia coli putative RNA helicase which can 
suppress a mutation in the rpsB gene for ribosomal protein S2. - rhlB. an Escherichia coli putative RNA helicase. - 
rhIE, an Escherichia coli putative RNA helicase. - snriB. an Escherichia coli protein that shows RNA-dependent ATPase 
activity It probably interacts with 23S ribosomal RNA. - Caenoriiabditis elegans hypothetical proteins T26G10.1, 
ZK512.2 and ZK686.2. - Yeast hypothetical protein YHR065c. - Yeast hypothetical protein YHR169w. - Fission yeast 
hypothetical protein SpAC31 A2.07c. - Bacillus subtilis hypothetical protein yxiN. All these proteins share a number of 
conserved sequence motifs. Some of them are specific to this family while others are shared by other ATP-binding 
proteins or by proteins belonging to the helicases 'superfamily' [4.EI1. One of these motifs, called the 'D-E-A-D-box', 
represents a special version of the B motif of ATP-binding proteins. Some other proteins belong to a subfamily which 
have His instead of the second Asp and are thus said to be 'D-E-A-H-box' proteins [3.5,6.Er|. Proteins currently known 
to belong to this subfamily are: - PRP2, PRP16. PRP22 and PRP43. These yeast proteins are all involved in various 
ATP-requiring steps of the pre-mRNA splicing process. - Fission yeast prhl, which my be involved in pre-mRNA splicing. 

- Male-less (mle), a Drosophila protein required in males, for dosage compensation of X chromosome linked genes. - 
RAD3 from yeast. RAD3 is a DNA helicase involved in excision repair of DNA damaged by UV light, bulky adducts or 
cross-linking agents. Fission yeast rad15 (rhp3) and mammalian DNA excision repair protein XPD (ERCC-2) are the 
homologs of RAD3. - Yeast CHL1 (or CTF1), which is important for chromosome transmission and nornial cell cycle 
progression in G(2)/M. - Yeast TPS1. - Yeast hypothetical protein YKL078w. - Caenorhabditis elegans hypothetical 
proteins C06E1 . 1 0 and K03H 1 .2. - Poxviruses' eariy transcription factor 70 Kd subunit which acts with RNA polymerase 
to initiate transcription from eariy gene promoters. - 18, a putative vaccinia virus helicase. - hrpA. an Escherichia coli 
putative RNA helicase. Signature patterns for both subfamilies were developed. 

[0443] Consensus pattern: [LIVMF](2)-D-E-A-D-IRKEN]-x-[LlVMFYGSTN 
Consensus pattern: [GSAH]-x-[LIVMFl(3)-D-E-[ALIV]-H-[NECR] 

Note: proteins belonging to this family also contain a copy of the ATP/GTP- binding motif 'A' (P-loop) (see the relevant 
entry <PDOC00017 

[ 1] Schmid S.R., Under P Mol. Microbiol. 6:283-292(1992). 

[ 2] Under R, Lasko P, Ashburner M., Leroy P. Nielsen PJ., Nishi K., Schnier J.. Slonimski PP Nature 337:121-122 
(1989). 

[ 3) Wassannan D.A., Steitz J.A. Nature 349:463-464(1991). 

[4] Hodgman TC. Nature 333:22-23(1988) and Nature 333:578-578(1988) (En^ta). 
[ 5] Harosh I., Deschavanne P Nucleic Acids Res. 19:6331-6331(1991). 
[ 6] Koonin E.V.. Senkevich T.G. J. Gen. Virol. 73:989-993(1992). 

[0444] 1 32. (DHBP_.synthase) 3,4-dihydroxy-2-butanone 4-phosphate synthase 

[0445] 3,4-Dihydroxy-2-butanone 4-phosphate is biosynthesized from ribulose 5-phosphate and serves as the bio- 
synthetic precursor for the xylene ring of riboflavin. Sometimes found as a Afunctional enzyme with GTP_cydohydro2. 
[0446] Richter G, Krieger C, Volk R. Kis K, Ritz H. Gotze E, Bacher A, Methods Enzymol 1997;280:374-382. 
[0447] 133. (DHDPS) Dihydrodipicolinate synthetase signatures 

Dihydrodipicolinate synthetase (EC 4.2.1.52 ) (DHDPS) [1] catalyzes, in higher plants chloroplast and in many bactena 
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(gene dapA), the first reaction specific to the biosynthesis of lysine and of diaminopinnelate. DHDPS is responsible for 
the condensation of aspartate semialdehyde and pyruvate by aping-pong mechanism in which pyruvate first binds to 
the enzyme by forming a Schiff>base with a lysine residue. Three other proteins are structurally related to DHDPS and 
probably also act via a similar catalytic mechanism: - Escherichia coli N-acetylneuraminate lyase (EC 4.1.3.3) (gene 

5 nanA), which catalyzes the condensation of N-acetyl-D-mannosamine and pyruvate to form N-acetylneuraminate. - 
Rhizobium meliloti protein mosA [3], which is involved in the biosynthesis of the rhizopine 3-o-methyl-scyllo-inosamine. 
- Escherichia coli hypothetical protein yjhH. Two signature patterns for these enzymes were developed . The first one 
is centered on highly conserved region in the N-tenminal part of these proteins. The second signature contains a lysine 
residue which has been shown, in Escherichia coli dapA [2], to be the one that forms a Schiff-base with the substrate. 

10 [0448] Consensus pattem: [GSA]-[LIVM]-[LIVMPn-x(2)-G-[ST|-[TGl-G-E-(GASNF]-x(6)-[EQ]- 

Consensus pattem: Y-(DNS]-[LIVIVIFA]-P-x(2)-[ST|-x(3)-[LIVMG]-x(13,14HLIVM]- x-(SGA]-[LIVMF]-K-[DEQAF]- 
[STAC] [K is involved in Schiff-base formation]- 

[ 1] Kaneko T, Hashimoto T, Kumpaisal R.. Yamada Y. J, Biol. Chem. 265:17451-17455(1990). 
15 [ 2] Laber B., Gomis-Rueth F.-X.. Romao M.J., Huber R. Biochem. J. 288:691-695(1992). 

[ 3] Murphy RJ., Trenz S.P., Grzemski W.. de Bruijn F.J.. Schell J. J. Bacteriol. 175:5193-5204 (1993). 

[0449] 134. (DHOdehase) Dihydroorotate dehydrogenase signatures 

Dihydroorotate dehydrogenase (EC 1.3.3.1) (DHOdehase) catalyzes the fourth step in the de novo biosynthesis of 

20 pyrimidine. the conversion of dihydroorotate into orotate. DHOdehase is a ubiquitous FAD flavoprotein. In bacteria 
(gene pyrD), DHOdease is located on the inner side of the cytosolic membrane. In some yeasts, such as in Saccha- 
romyces cerevisiae (gene URA1), it is a cytosolic protein while in other eukaryotes it is found in the mitochondria [1]. 
The sequence of DHOdease is rather well conserved and two signature pattems were developed specific to this en- 
zyme. The first corresponds to a region in the N-terminal section of the enzyme while the second is located in the C- 

25 terminal section and seems to be part of the FAD-binding domain. 

Consensus pattem[GS]-x(4)-[GK]-[GSTA]-[LIVFSTA]-IGT]-x(3)-[NQR]-x-G-[NHY]-x{2)-P-[RT] 
[0450] Consensus pattem(LIVM](2)-[GSA]-x-G-G-[IV]-x.[STGDN]-x(3)-[ACV]-x(6)-G-A 
[0451] [ 1] Nagy M.. Lacroute F.. Thomas D. Proc. Natl. Acad. Sci. U.S.A. 89:8966-8970(1992). 
[0452] 135. (DMRL_synthase) 6,7-dimethyl-8-ribityllumazine synthase 

30 [0453] 136. (DNA_methylase) C-5 cytosine-specific DNA methylases signatures 

C-5 cytosine-specific DNA methylases (EC 2. 1.1. 73 ) (C5 Mtase) are enzymes that specifically methylate the C-5 carbon 
of cytosines in DNA [1,2.3]. Such enzymes are found in the proteins described below. - As a component of type II 
restriction-modification systems in prokaryotes and some bacteriophages. Such enzymes recognize a specific DNA 
sequence where they methylate a cytosine. In doing so. they protect DNA from cleavage by type II restriction enzymes 

35 that recognize the same sequence. The sequences of a large number of type II C-5 Mtases are known. - In vertebrates, 
there are a number of C-5 Mtases that methylate CpG dinucleotides. The sequence of the mammalian enzyme is 
known. C-5 Mtases share a number of short conserved regions. Two of them were selected. The first is centered around 
a conserved Pro-Cys dipeptide in which the cysteine has been shown [4] to be involved in the catalytic mechanism; it 
appears to form a covalent intermediate with the C6 position of cytosine. The second region is located at the C-terminal 

40 extremity in type^ll enzymes 

[0454] Consensus pattem: [DENKS]-x-[FLIV]-x(2)-[GSTC]-x-P-C-x(2)-tFYWLIM]-S [C is the active site residue]- 
Consensus pattem: [RKQGTF]-x(2)-G-N-[STAGI-[LIVMF)-x(3)-[LIVMT|-x(3)-[LIVM]-x(3)-[LIVM]- 

[ 1] Posfai J., Bhagwat A.S.. Roberts R.J. Gene 74:261-263(1988). 
45 [ 2] Kumar S., Cheng X.. Kiimasauskas S.. Mi S., Posfai J.. Roberts R.J.. Wilson G.G. Nucleic Acids Res. 22:1-10 

(1994). 

[ 3] Lauster R., Trautner T.A.. Noyer-Weidner M. J. Mol. Biol. 206:305-312(1989). 

[ 4] Chen L,. McMillan A.M., Chang W., Ezak-Nipkay K.. Lane W.S.. Verdine G.L. Biochemistry 30:11018-11025 
(1991). 

so 

[0455] 137. (DNAphotolyase) DNA photolyases class 2 signatures 

Deoxyribodipyrimidine photolyase (EC 4.1.99.3 ) (DNA photolyase) [1,2] is a DNArepair enzyme. It binds to UV-dam- 
aged DNA containing pyrimidine dimers and, upon absorbing a near-UV photon (300 to 500 nm), breaks the cyclobutane 
ring joining the two pyrimidines of the dimer. DNA photolyase is an enzyme that requires two choromophore-cofactors 
55 for its activity: a reduced FADH2 and either 5.10-methenyltetrahydrofolate (5.10-MTFH) or an oxidized 8-hydroxy- 
5-deazaflavin (8-HDF) derivative (F420). The folate or deazaflavin chromophore appears to function as an antenna, 
while the FADH2 chromophore is thought to be responsible for electron transfer. On the basis of sequence similarities 
[3] DNA photolyases can be grouped Into two classes. The second class contains enzymes from Myxococcus xanthus. 
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methanogenic archaebacteria, insects, fish and marsupial mammals. It is not yet known what second cofactor is bound 
to class 2 enzymes. There are a number of conserved sequence regions in all known class 2 DNAphotoIyases, espe- 
cially in the C-terminal part. Two of these regions were selected as signature patterns. 
Consensus pattern: F-x-E-E-x-[LIVM]{2)-R-R-E-L-x(2)-N-F- 
5 Consensus pattern: G-x-H-D-x(2)-W-x-E-R-x-[LIVM]-F-G-K-[LIVM]-R-[FY]-M-N- 

[ 1] Sancar G.B.. Sancar A. Trends Biochem. Sci. 12:259-261(1987). 
[ 2] Jorns M.S. Biofactors 2:207-211(1990). 

[ 3] Yasui A., Eker A.P.M., Yasuhira S., Yajima H., Kobayashi T, Takao M., Oikawa A. EMBO J. 13:6143-6151 
10 (1994). 

[0456] (DNAphotolyase2) DNA photolyases class 1 signatures 

Deoxyrlbodipyrimidine photolyase (EC 4.1.99.3) (DNA photolyase) [1.2] is a DNA repair enzyme. It binds to UV-dam- 
aged DNA containing pyrimtdine dimers and ,upon absorbing a near-UV photon (300 to 500 nm), breaks the cyclobu- 

15 tane ring joining the two pyrimidines of the dimer. DNA photolyase is an enzyme that requires two choromophore- 
cofactors for its activity: a reduced FADH2 and either 5,10-methenyltetrahydrofolate (5,10-MTFH) or an oxidized 8-hy- 
droxy-5-deazaflavin (8-HDF) derivative (F420). The folate or deazaflavin chromophore appears to function as an an- 
tenna, while the FADH2 chromophore is thought to be responsible for electron transfer. On the basis of sequence 
similaritiespi DNA photolyases can be grouped into two classes. The first class contains enzymes from Gram-negative 

20 and Gram-positive bacteria, the halophilic archaebacteria Halobacterium halobium. fungi and plants. Class 1 enzymes 
bind either 5,10-MTHF (E.coli, fungi, etc.) or 8-HDF (S.griseus, H.halobium).This family also includes Arabidopsis 
cryptochromes 1 (CRY1) and 2 (CRY2),which are blue light photoreceptors that mediate blue light-induced gene ex- 
pression. There are a number of conserved sequence regions in all known class 1 DNA photolyases, especially In the 
C-terminal part. Two of these regions were selected as signature patterns 

25 [0457] Consensus pattem: T-G-x-P-[LIVM](2)-D-A-x-M-[RA]-x-[LIVMl- 

Consensus pattem: IDN]-R-x-R-[LIVM](2)-x-tSTA](2)-F-[LIVMFA]-x-K-x-L-x(2,3)- W-[KRQ]- 

[ 1] Sancar G.B., Sancar A. Trends Biochem. Sci. 12:259-261(1987). 
[ 2] Jorns M.S. Biofactors 2:207-211(1990). 
30 [ 3] Yasui A., Eker A.P.M., Yasuhira S., Yajima H., Kobayashi T., Takao M., Oikawa A. EMBO J. 13:6143-6151 

(1994). 

[ 4] Lin C, Ahmad M.. Cashmore A.R. Plant J. 10:893-902(1996). 

[0458] 1 38. (DNA_poLA) 
35 DNA polymerase family A signature 

Repllcatlve DNA polymerases (EC 2.7.7.7) are the key enzymes catalyzing the accurate replication of DNA. They 
require either a small RNA molecule or a protein as a primer for the de novo synthesis of a DNA chain. On the basis 
of sequence similarities a number of DNA polymerases have been grouped together [1.2,3] under the designation of 
DNA polymerase family A. The polymerases that belong to this family are listed below. 

40 

Escherichia coli and various other bacterial polymerase I (gene polA). 
Thermus aquatlcus Taq polymerase. 
Bacteriophage spOl polymerase. 
Bacteriophage sp02 polymerase. 
45 - Bacteriophage T5 polymerase. 

Bacteriophage T7 polymerase. 

Mycobacteriophage L5 polymerase. 

Yeast mitochondrial polymerase gamma (gene Ml PI). 

50 [0459] Five regions of similarity are found in all the above polymerases. One of these conserved regions, known as 
•motif B' [1], Is located In a domain which, In Escherichia coli polA, has been shown to bind deoxynucleotide triphosphate 
substrates; It contains a conserved tyrosine which has been shown, by photo- affinity labelling, to be in the active site; 
a conserved lysine, also part of this motif, can be chemically labelled, using pyridoxal phosphate. This conserved region 
was used as a signature for this family of DNA polymerases. 

55 [0460] Consensus patternR-x(2)-[GSAV]-K-x(3)-[LIVMFY]-[AGQ]-x(2)-Y-x(2)-[GS]-x(3)-[L!VMA] Sequences known 
to belong to this class detected by the pattem ALL. 

[ 1] Delarue M., Poch O., Todro N., Moras D.. Argos P. Protein Eng. 3:461-467(1990). 
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[ 2] Ito J., Bralthwaite D.K. Nucleic Acids Res. 19:4045-4057(1991). 
[ 3] Braithwaite O.K.. Ito J. Nucleic Acids Res. 21:787-802(1993). 

[0461] 1 39. DNA_poLviraLC 
5 DNA polymerase (viral) C-terminal domain 
Number of members: 128 
[0462] 140. (DNA^topoisoll) 
DNA topoisomerase II signature 

DNA topoisomerase I (EC 5.99.1.2) [1,2, 3,4, El] is one of the two types of enzyme that catalyze the interconversion 
10 of topological DNA isomers. Type II topoisomerases are ATP-dependent and act by passing a DNA segment through 
a transient double-strand break. Topoisomerase II is found in phages, archaebacteria. prokaryotes. eukaryotes, and 
in African Swine Fever virus (ASF). In bacteriophage T4 topoisomerase II consists of three subunits (the product of 
genes 39, 52 and 60). In prokaryotes and in archaebacteria the enzyme, known as DNA gyrase, consists of two subunits 
(genes gyrA and gyrB [E2]). In some bacteria, a second type II topoisomerase has been identified; it is known as 
15 topoisomerase IV and is required for chromosome segregation, it also consists of two subunits (genes parC and parE). 
in eukaryotes, type II topoisomerase is a homodimer. 

[0463] There are many regions of sequence homology between the different subtypes of topoisomerase II. The 
relation between the different subunits is shown in the following representation: 

20 

< About- 1 400-residues > 



25 I Protein 39-* ][ — Protein 52—] Phage T4 

[ gyrB *-—][ gyrA ] Prolcaryote II 

Archaebaaeria 

30 [ parE * ][ parD ] Prokaryote IV 

J * ] Eukaryote and 

ASF 

^5 Position of the pattern. 

[0464] As a signature pattern for this family of proteins, a region that contains a highly conserved pentapeptide was 
selected. The pattem is located in gyrB. in parE. and in protein 39 of phage T4 topoisomerase. 
40 [0465] Consensus pattem[LIVMA]-x-E-G-[DN]-S-A-x-[STAG] Sequences known to belong to this class detected by 
the pattem ALL. 

[ 1] Stemglanz R. Curr. Opin. Cell Biol. 1:533-535(1990). 
[ 2] Bjornsti M.-A. Cun*. Opin. Struct. Biol. 1:99-103(1991). 
45 [ 3] Sharma A., Mondragon A. Curr. Opin. Struct. Biol. 5:39-47(1995). 

[4] Roca J. Trends Biochem. Sci. 20:156-160(1995). 

[0466] 141 (DSPc) Tyrosine specific protein phosphatases signature and profiles 

Tyrosine specific protein phosphatases (EC 3.1.3.48) (PTPase) [1 to 5] are enzymes that catalyze the removal of a 
50 phosphate group attached to a tyrosine residue. These enzymes are very important in the control of cell growth, pro- 
liferation, differentiation and transformation. Multiple forms of PTPase have been characterized and can be classified 
into two categories: soluble PTPases and transmembrane receptor proteins that contain PTPase domain(s). The cur- 
rently known PTPases are listed below: Soluble PTPases. - PTPN1 (PTP-1B). - PTPN2 (T-cell PTPase; TC-PTP). - 
PTPN3 (HI) and PTPN4 (MEG), enzymes that contain an N-tenminal band 4.1- like domain (see <PDOC00566 >) and 
55 could act at junctions between the membrane and cytoskeleton. - PTPN5 (STEP). - PTPN6 (PTP-1C; HCP; SHP) and 
PTPN11 (PTP-2C; SH-PTP3; Syp). enzymes which contain two copies of the SH2 domain at its N-terminal extremity. 
The Drosophila protein corkscrew (gene csw) also belongs to this subgroup. - PTPN7 (LC-PTP; Hematopoietic protein- 
tyrosine phosphatase; HePTP). - PTPN8 (70Z-PEP). - PTPN9 (MEG2). - PTPN12 (PTP-G1; PTP-P19). - Yeast PTP1. 
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- Yeast PTP2 which may be involved in the ubiquitin-mediated protein degradation pathway. - Fission yeast pyp1 and 
pyp2 which play a role in inhibiting the onset of mitosis. - Fission yeast pyp3 which contributes to the dephosphorylation 
of cdc2.-Yeast CDC14 which may be involved in chromosome segregation. - Yersinia virulence ptasmid PTKAses (gene 
yopH). - Autographs califomica nuclear polyhedrosis virus 1 9 Kd PTPase.Dual specificity PTPases. - DUSP1 (PTPN 1 0; 

5 MAP kinase phosphatase-1 ; MKP-1); which dephosphorylates MAP kinase on both Thr-183 and Tyr-185. - DUSP2 
(PAC-1), a nuclear enzyme that dephosphorylates MAP kinases ERK1 and ERK2 on both Thr and Tyr residues. - 
DUSP3 (VHR). - DUSP4 (HVH2). - DUSP5 (HVH3). - DUSP6 (Pyst1 ; MKP-3). - DUSP7 (Pyst2; MKP-X). - Yeast MSGS, 
a PTPase that dephosphorylates MAP kinase FUS3. - Yeast YVH1. - Vaccinia virus H1 PTPase; a dual specificity 
phosphatase. Receptor PTPases. Structurally, all known receptor PTPases, are made up of a variable length extra- 

10 cellular domain, followed by a transmembrane region and a C-terminaicatalytic cytoplasmic domain. Some of the re- 
ceptor PTPases contain fibronectintype III (FN-III) repeats, immunoglobulin-like domains, MAM domains orcarbonic 
anhydrase-IIke domains in their extracellular region. The cytoplasmic region generally contains two copies of the PT- 
PAse domain. The first seems to have enzymatic activity, while the second is inactive but seems to affect substrate 
specificity of the first. In these domains, the catalytic cysteine is generally conserved but some other, presumably 

15 important, residues are not. In the following table, the domain structure of known receptor PTPases is shown: Extra- 
cellular Intracellular Ig FN-3 CAH MAM PTPaseLeukocyte common antigen (LCA) (CD45) 0 

2 0 0 2Leukocyte antigen related (LAR) 3 8 0 0 2 Drosophila DLAR 3 9 0 0 2Drosophila DPTP 2 2 0 0 2PTP-alpha 
(LRP) 0 0 0 0 2PTP-beta 0 16 0 0 1PTP-gamma 0 110 2PTP-delta 0 >7 0 0 2 PTP-epsilon 0 0 0 0 2PTP-kappa 1 4 
0 1 2PTP-mu 14 0 1 2PTP-zeta 0110 2PTPase domains consist of about 300 amino acids. There are two conserved 

20 cysteines, the second one has been shown to be absolutely required for activity. Furthermore, a number of conserved 
residues in its immediate vicinity have also been shown to be important. A signature pattern for PTPase domains was 
derived centered on the active site cysteine. There are three profiles for PTPases, the first one spans the complete 
domain and is not specific to any subtype. The second profile is specific to dual-specificity PTPases and the third one 
to the PTP subfamily 

25 [0467] Consensus pattern: [LIVMF]-H-C-x(2)-G-x(3)-[STC]-[STAGP]-x-[LIVMFY] [C is the active site residue]- 

[ 1] Fischer E.H., Chart)onneau H., Tonks N.K. Science 253:401-406(1991). 
[ 2] Chartx)nneau H., Tonks N.K. Annu. Rev. Cell Biol. 8:463-493(1992). 
[ 3] Trowbridge I.S. J. Biol. Chem. 266:23517-23520(1991). 
30 [ 4] Tonks N.K., Charbonneau H. Trends Biochem. Sci. 14:497-500(1989). 

[ 5] Hunter T. Cell 58:1013-1016(1989). 

[0468] 142. (DUF10) Uncharacterized protein family UPF0076 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarifies: - Goat antigen UK114, a 
35 human homolog and the rat corresponding protein which is known as perchloric acid soluble protein (PSP1). PSP1 [2] 
may inhibit an initiation stage of cell-fi-ee protein synthesis. - Mouse heat-responsive protein HRSP12. - Yeast chro- 
mosome V hypothetical protein YER057c. - Yeast chromosome IX hypothetical protein YIL051c, - Caenorhabditis el- 
egans hypothetical protein C23G10.2. - Escherichia coli hypothetical protein ycdK. - Escherichia coli hypothetical pro- 
tein yhaR. - Escherichia coli hypothetical protein yjgF and HI0719, the con-esponding Haemophilus influenzae protein. 
40 - Escherichia coli hypothetical protein yoaB. - Bacillus subtilis hypothetical protein yabJ. - Haemophilus influenzae 
hypothetical protein H1 1627. - Helicobacter pylori hypothetical protein HP0944. - Lactococcus lactis aldR. - Myxococcus 
xanthus dft-A. - Synechocystis strain PCC 6803 hypothetical protein slr0709. - Rhizobium strain NGR234 symbiotic 
plasmid hypothetical protein y4sK. - Pyrococcus horikoshii hypothetical protein PH0854.These are small proteins of 
around 15 Kd whose sequence is highly conserved.As a signature pattem. a well conserved region located in the C- 
45 terminal part of these proteins was selected. 

[0469] Consensus pattem: [PAl-[ASTPV]-R-[SACVF]-x-[LIVMFY]-x(2)-[GSAKR]-x-[LMVA]-x(5.8)-[LIVM]-E-IMI]- 

[ 1] Bairoch A. Unpublished observations (1995). 

[ 2] Oka T., Tsuji H.. Noda C. Sakai K., Hong Y.-M.. Suzuki I.. Munoz S.. Natori Y. J. Biol. Chem. 270:30060-30067 
so (1995). 

[0470] 143. (DUF3)Domain of Unknown Function 3 
Domain apparently occurring exclusively in eubacteria. Unknown function. 
[0471] 144. (DUF6) Integral membrane protein 
55 [0472] This family includes many hypothetical membrane proteins of unknown function. Many of the proteins contain 
two copies of the aligned region. 
[0473] 145, (DUF7) Integral membrane protein 

[0474] This family includes many hypothetical membrane proteins of unknown function. Swiss:P14502 has been 
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impiicated in resistance to ethidium bromide. 

[0475] 146. (DapB) Dihydrodipicolinate reductase signature 

Dihydrodipicolinate reductase (EC 1.3.1.26) catalyzes the second step in the biosynthesis of diaminopimelic acid and 
lysine, the NAD or NADP-dependent reduction of 2,3-dihydrodipicolinate into 2.3,4.5-tetrahydrodipicolinate. This en- 
zyme is present in bacteria (gene dapB) and higher plants. As a signature pattern the best conserved region in this 
enzyme was selected. It is located in the central section and is part of the substrate-binding region [1]. 
[0476] Consensus pattern: E-[IV]-x-E-x-H-x(3)-K-x-D-x-P-S-G-T-A- 
[0477] ( 1] Scapin G., Blanchard J.S., Sacchettlnl J.C. Biochemistry 34:3502-3512(1995). 
[0478] 147. DedA family 

[0479] This family combines the DedA related proteins and YIANA^GIK family. Members of this family are not func- 
tionally characterised. These proteins contain multiple predicted transmembrane regions. 
[0480] 148. DegT/DnrJ/EryCI/StrS family 

[0481] The members of this family exhibit some characteristics of the sensor protein of two-component signal trans- 
duction systems, however none of the members show any sequence similarity to these protein kinases. The members 
of this family do have the typical helix-tum-helix motif of DNA binding proteins. 
[0482] [1] Stutzman-Engwall KJ, Otten SL. Hutchinson CR. J Bacterid 1992;174:144-154. 
[0483] 149. (Desaturase) Fatty acid desaturases signatures 

Fatty acid desaturases (EC 1.14.99.-) are enzymes that catalyze the insertion of a double bond at the delta position 
of fatty acids. There seems to be two distinct families of fatty acid desaturases which do not seem to be evolutionary 
related. Family 1 is composed of: - Stearoyl-CoA desaturase (SCD) (EC 1.14.99.5) [1]. SCD is a key regulatory enzyme 
of unsaturated fatty acid biosynthesis. SCD introduces a cis double bond at the delta(9) position of fatty acyl-CoA's 
such as palmitoleoyi- and oleoyl-CoA. SCD is a membrane-bound enzyme that is thought to function as a part of a 
muitienzyme complex in the endoplasmic reticulum of vertebrates and fungi. As a signature pattern for this family a 
conserved region in the C-terminal part of these enzymes was selected, this region is rich in histidine residues and in 
aromatic residues. Family 2 is composed of: - Plants stearoyl-acyl-carrier-protein desaturase (EC 1.14.99.6 ) [2], these 
enzymes catalyze the introduction of a double bond at the delta(9) position of steraoyl-ACP to produce oleoyl-ACP. 
This enzyme is responsible for the conversion of saturated fatty acids to unsaturated fatty acids in the synthesis of 
vegetable oils. - Cyanobacteria desA [3] an enzyme that can introduce a second cis double bond at the delta(12) 
position of fatty acid bound to membranes glycerolipids. DesA is involved in chilling tolerance; the phase transition 
temperature of lipids of cellular membranes being dependent on the degree of unsaturation of fatty acids of the mem- 
brane lipids. As a signature pattem for this family a conserved region in the C-terminal part of these enzymes was 
selected. 

[0484] Consensus pattern: G-E-x-[FY]-H-N-[FY]-H-H-x-F-P-x-D-Y- 

Consensus pattern: [STl-[SA]-x(3)-[QR]-[LI]-x(5,6)-D-Y-x(2)-[LIVMFYW]-[LIVM]-[DE]- 

[ 1] Kaestner K.H., Ntambi J.M.. Kelly TJ. Jr., Lane M.D. J. Biol. Chem. 264:14755-14761(1989). 
[ 2] Shanklin J., Somerviiie C.R. Proc. Natl. Acad. Sci. U.S.A. 88:2510-2514(1991). 
[ 3] Wada H.. Gombos Z.. Murata N. Nature 347:200-203(1990). 

[0485] 1 50. Dihydroorotase signatures 

Dihydroorotase (EC 3.5.2.3) (DHOase) catalyzes the third step in the de novo biosynthesis of pyrimidine, the conversion 
of ureidosuccinic acid (N-carbamoyl-L-aspartate) into dihydroorotate. Dihydroorotase binds a zinc ion which is required 
for its catalytic activity [1]. In bacteria. DHOase is a dimer of identical chains of about 400 amino-acid residues (gene 
pyrC). In higher eukaryotes, DHOase is part of a large multi-functional protein known as 'rudimentary' in Drosophila 
and CAD in mammals and which catalyzes the first three steps of pyrimidine biosynthesis [2]. The DHOase domain is 
located in the central part of this polyprotein. In yeasts. DHOase is encoded by a monofunctional protein (gene URA4). 
However, a defective DHOase domain [3] is found in a multifunctional protein (gene URA2)that catalyzes the first two 
steps of pyrimidine biosynthesis. The comparison of DHOase sequences from various sources shows [4] that there 
are two highly conserved regions. The first located in the N-terminal extremity contains two histidine residues suggested 
[3] to be involved in binding the zinc ion. The second is found in the C-terminal part. Signature patterns for both regions 
have been developed. Allantoinase (EC 3.5.2.5 ) is the enzyme that hydrolyzes allantoin intoallantoate. In yeast (gene 
DAL1) [5], it is the first enzyme in the allanto indegradation pathway; in amphibians [6] and fish it catalyzes the second 
step in the degradation of uric acid. The sequence of allantoinase is evolutionary related to that of DHOases. 
[0486] Consensus pattem: D-[LIVMFYWSAP]-H-ILIVA]-H-[LIVF]-[RN]-x-[PGANF] [The two H's are probable zinc 
ligands]- 

Consensus pattem: [GA]-[STl-D-x-A-P-H-x(4)-K- 

[ 1] Brown D.C.. Collins K.D. J. Biol. Chem. 266:1597-1604(1991). 
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1 2] Davidson J.N., Chen K.C., Jamison R.S., Musnnanno L.A., Kern C.B. BioEssays 15:157-164(1993). 
[ 3] Souciet J.-L., Nagy M., Le Gouar M., Lacroute R, Potier S. Gene 79:59-70(1989). 
[4] Guyonvarch A., Nguyen-Juilleret IVI., Hubert J.-C, Lacroute F. Mol. Uen. uenet. 212:134- 141(i9oo). 
[ 5] Buckholz R.G.. Cooper T.G. Yeast 7:913-923(1991). 

[ 6] Hayashi S.. Jain S., Chu R.. Alvares K.. Xu B.. Erfurth F.. Usuda N.. Rao M.S., Reddy S.K.. Noguchi T. Reddy 
J.K.. Yeldandi A.Y. J. Biol. Chem. 269:12269-12276(1994). 



[0487] 151 . dnaJ domains signatures and profile 

[0488] The prokaryotic heat shock protein dnaJ interacts with the chaperone hsp70-like dnaK protein [1]. Structurally. 
10 the dnaJ protein consists of an N- terminal conserved domain (called 'J' domain) of about 70 amino acids, a glycine- 
rich region {'G* domain') of about 30 residues, a central domain containing four repeats of a CXXCXGXG motif ('CRR' 
domain) and a C-terminal region of 120 to 170 residues. Such a structure is shown in the following schematic repre- 
sentation: 



+ 4 + + + 1 N-terminal 1 1 

Gly-R 1 1 CXXCXGXG | C-tenninal | -i +-+ + + + 



20 



[0489] It has been shown [2] that the 'J' domain as well as the "CRR" domain are also found in other prokaryotic and 
eukaryotic proteins which are listed below. 



a) Proteins containing both a 'J' and a 'CRR' domain: 

25 

- Yeast protein MAS5A'DJ1 which seems to be involved in mitochondrial protein import. 
Yeast protein MDJ1, involved in mitochondrial biogenesis and protein folding. 

Yeast protein SCJ1, involved in protein sorting. 
Yeast protein XDJ1. 
30 - Plants dnaJ homologs (from leek and cucumber). 

Human HDJ2, a dnaJ homolog of unknown function. 
Yeast hypothetical protein YNL077w. 

a) Proteins containing a 'J'domain without a 'CRR' domain: 

35 

Rhizobium fredii nolC, a protein involved in cultivar-specific nodulation of soybean. 
Escherichia coli cbpA [3], a protein that binds curved DNA. 

- Yeast protein SEC63/NPL1 , important for protein assembly into the endoplasmic reticulum and the nucleus. 
Yeast protein SIS1 , required for nuclear migration during mitosis. 

40 - Yeast protein CAJ 1 . 

- Yeast hypothetical protein YFR041 c. 

- Yeast hypothetical protein YIR004w. 

- Yeast hypothetical protein YJL1 62c. 

- Plasmodium falciparum ring-infected erythrocyte surface antigen (RESA). RESA. whose function is not known. 
45 is associated with the membrane skeleton of newly invaded erythrocytes. 

Human HDJ1. 

Human HSJ1. a neuronal protein. 
Drosophila cysteine-string protein (csp). 

50 [0490] A signature pattern for the 'J* domain was developed, based on conserved positions in the C-terminal half of 
this domain. A pattern for the 'CRR' domain, based on the first two copies of that motif was also developed. A profile 
for the 'J' domain was also developed. 

[0491] Consensus pattem: [FY]-x{2)-[LIVMA]-x(3)-[FYWHNT]-[DENQSA]-x-L-x-[DN]-x{3)-[KR]-x(2)-[FYIl- 
Consensus pattem: C- [DEGSTHKRl-x-C-x-G-x-[GK]-[AGSDM]-x(2)-[GSNKR]-x(4,6)-C-x(2.3)-C-x-G-x-G- 

55 

[1] Cyr D.M., Langer T. Douglas M.G. Trends Biochem. Sci. 19:176-181(1994). 

[2] Bork P.. Sander C, Valencia A., Bukau B. Trends Biochem. Sci. 17:129-129(1992). 

[3] Ueguchi C. Kaneda M., Yamada H.. Mizuno T. Proc. Natl. Acad. Sci. U.S.A. 91:1054-1058(1994). 
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[0492] 152. 
[0493] 153. Dwarfin 

[0494] This family known as the dwarfins also Includes the drosophila protein MAD. The N-terminus of MAD can 
bind to DNA [2]. 

5 [0495] [1] Yingling JM. Das P. Savage C. Zhang M. Padgett RW. Wang XR Proc Natl Acad Sci U S A 1996:93: 
8940-8944. [2] Kim J, Johnson K, Chen HJ, Can-oil S, Laughon A. Nature 1997:388:304-308. 
[0496] 154. Dynein light chain type 1 signature 

Dynein is a multisubunit microtubule-dependent motor enzyme that acts as the force generating protein of euloryotic 
cilia and flagella. The cytoplasmic isoform of dynein acts as a motor for the intracellular retrograde motility of vesicles 

10 and organelles along microtubules. Dynein is composed of a number of ATP-binding large subunits. intermediate size 
subunits and small subunits. Among the small subunits, there is a family [1 .2] of highly conserved proteins which consist 
of: - Chlamydomonas reinhardtii flagellar outer arm dynein 8 Kd and 11 Kd light chains. - Higher eukaryotes cytoplasmic 
dynein light chain 1 .-Yeast cytoplasmic dynein light chain 1 (gene DYN2 or SLC1 ). - Caenorhabditis elegans hypothet- 
ical dynein light chains Ml 8.2 and T26A5.9.These proteins are have from 89 to 120 amino acids. As a signature pattern, 

15 A highly conserved region was selected. 

Consensus pattern: H-x-l-x-G-[KR]-x-F-[GA]-S-x-V-[STl-[HY]-E - 

( 1] King S.M.. Patel-King R.S. J. Biol. Chem. 270:11445-11452(1995). 

( 2] Dick T, Ray K.. Salz H.K.. Chia W. Mol. Cell. Biol. 16:1966-1977(1996). 

20 

[0497] 155. dUTPase 

[0498] dUTPase hydrolyzes dUTP to dUMP and pyrophosphate. 

[0499] [1] Cedergren-Zeppezauer ES, Larsson G, Nyman PO, Dauter Z. Wilson KS, Nature 1992:355:740-743. [2] 
Mol CD, Han-is JM. Mcintosh EM, Tainer JA, Structure 1996;4:1077-1092. 

25 [0500] 156. (dCMP cyt deam) Cytidine and deoxycytidylate deaminases zinc-binding region signature 

Cytidine deaminase (EC 3.5.4.5 ) (cytidine aminohydrolase) catalyzes the hydrolysis of cytidine into uridine and am- 
monia white deoxycytidylatedeaminase (EC 3.5.4.12) (dCMP deaminase) hydrolyzes dCMP into dUMP. Both enzymes 
are known to bind zinc and to require it for their catalytic activity [1 ,2]. These two enzymes do not share any sequence 
similarity with the exception of a region that contains three conserved histidine and cysteine residues which are thought 

30 to be involved in the binding of the catalytic zincion. Such a region is also found in other proteins [3,4]: - Yeast cytosine 
deaminase (EC 3.5.4.1 ) (gene FCY1) which transforms cytosine into uracil. - Mammalian apolipoprotein B mRNA 
editing protein, responsible for the postranscriptional editing of a CAA codon into a UAA (stop) codon in the APOB 
mRNA. - Riboflavin biosynthesis protein ribG, which converts 2.5-diamino-6- (ribosylamino)-4(3H)-pyrimidinone 5- 
phosphate into 5-amino-6-(ribosylamino)-2,4(1H.3H)-pyrimidinedione 5*-phosphate. - Bacillus cereus blasticidin-S 

35 deaminase (EC 3.5.4.23 ), which catalyzes the deamination of the cytosine moiety of the antibiotics blasticidin S, cy- 
tomycin and acetylblasticidin S. - Bacillus subtiiis protein comEB. This protein is required for the binding and uptake 
of transforming DNA. - Bacillus subtiiis hypothetical protein yaaJ. - Escherichia coli hypothetical protein yfhC. - Yeast 
hypothetical protein YJL035c. A signature pattern for this zinc-binding region was derived. 

[0501] Consensus pattern: [CH]-[AGV]-E-x(2)-[LIVMFGAT]-[LIVM]-x(17.33)-P-C-x(2.8)-C-x(3)-[LIVM] [The C's and 
40 H are zinc ligands 

[ 1] Yang C. Carlow D.. Wolfenden R., Short S.A. Biochemistry 31:4168-4174(1992). 
[ 2] Moore J.T., Silversmith R.E., Malay G.F.. Malay F. J. Biol. Chem. 268:2288-2291(1993). 
[ 3] Reizer J.. Buskirk S., Bairoch A.,'Reizer A., Saier M.H. Jr. Protein Sci. 3:853-856(1994). 
45 [ 4] Bhattacharya S., Navaratnam N.. Morrison J.R., Scott J.. Taylow W.R. Trends Biochem. Sci. 19:105-106(1994). 

[0502] 157. Dehydrins signatures 

A number of proteins are produced by plants that experience water-stress. Water-stress takes place when the water 
available to a plant falls below a critical level. The plant hormone abscisic acid (ABA) appears to modulate the response 
50 of plant to water-stress. Proteins that are expressed during water-stress are called dehydrins [1 ,2] or LEA group 2 
proteins [3]. The proteins that belong to this family are listed below. 

Arabidopsis thaliana XERO 1, XERO 2 (LTI30). RAB18. ERD10 (LTI45) ERD14 and COR47. 
Barley dehydrins B8, B9. B17, and B18. 
55 - Cotton LEA protein D-1 1 . 

Craterostigma plantagineum dessication-related proteins A and B. 
Maize dehydrin M3 (RAB-17). 
Pea dehydrins DHN1. DHN2. and DHN3. 
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Radish LEA protein. 

- Rice proteins RAB 16B. 16C, 16D, RAB21, and RAB25. 

- Tomato TAS14. 

Wheat dehydrin RAB 15 and cold-shock protein cor410. cs66 and cs120. 

5 

[0503) Dehydrins share a number of structural features. One of the most notable features is the presence, in their 
central region, of a continuous run of five to nine serines followed by a cluster of charged residues. Such a region has 
been found in all known dehydrins so far with the exception of pea dehydrins. A second conserved feature is the 
presence of two copies of alysine-rich octapeptide; the first copy is located just after the cluster of charged residues 
10 that follows the poly-serine region and the second copy is found at the C-terminal extremity. Signature pattems for 
both regions were derived. 

[0504] Consensus pattern: S{5)-[DE]-x-[DE]-G-x{1 ,2)-G-x(0,1 )-[KR](4 
Consensus pattern: [KR]-[UM]-K-[DE]-K-[LIM]-P-G- 

15 [1] Close T.J.. Kortt A.A., Chandler P.M. Plant Mol. Biol. 13:95-108(1989). 

[2] Robertson M.. Chandler RM. Plant Mol. Biol. 19:1031-1044(1992). 

[3] Dure L. Mi. Crouch M., Harada J.. Ho T.-H. D., Mundy J.. Quatrano R.. Thomas T.. Sung Z.R. Plant Mol. Biol. 
12:475-486(1989). 

20 [0505] 1 58. (deoR) Bacterial regulatory proteins. deoR family signature 

The many bacteria! transcription regulation proteins which bind DNA through a helix4um-helix* motif can be classified 
into subfamilies on the basis of sequence similarities. One of these subfamilies groups the following proteins[1 .2]: - 
accR. the Agrobacterium tumefaciens plasmid pTiC58 repressor of opine catabolism and conjugal transfer. - agaR. 
the Escherichia coli aga operon putative repressor. - deoR, the Escherichia coli deoxyribose operon repressor. - fucR. 

25 the Escherichia coli L-fucose operon activator. - gatR, the Escherichia coli galactitol operon repressor. - gIpR. the 
Escherichia coli glycerol-3-phosphate regulon repressor. - gutR (or sriR), the Escherichia coli glucitol operon repressor. 

- ioIR, from Bacillus subtilis. - lacR. the streptococci lactose phosphotransferase system repressor. - spoil ID, the Bacillus 
subtilis transcription regulator of the sIgK gene. - yfjR, an Escherichia coli hypothetical protein. - ygbl, an Escherichia 
coli hypothetical protein. - yihW. an Escherichia coli hypothetical protein. -yjfO. an Escherichia coli hypothetical protein. 

30 - yjhJ, an Escherichia coli hypothetical protein. The *helix-tum-helix' DNA-binding motif of these proteins is located in 
the N-tenninal part of the sequence. The pattern used to detect these proteins starts fourteen residues before the HTH 
motif and ends one residue after it. 

[0506] Consensus pattern: R-x(3)-[LIVM]-x(3)-[LIVM]-x(16.17)-[STA]-x(2)-T-[LIVMA]- [RHHKRNA]-D-[LIVMF]- 

35 [ 1] von Bodman S., Hayman G.T, Fan^and S.K. Proc. Natl. Acad. Sci. U.S.A. 89:643-647(1992). 

[ 2] Balroch A. Unpublished observations (1993). 

[0507] 159. dsrm 
Double-stranded RNA binding motif 
40 [1] Burd CG, Dreyfuss G; Medline: 94310455. Conserved structures and diversity of functions of RNA-binding proteins. 
Science 1994;265:615-621. 

[0508] Sequences gathered for seed by HMM_iterative_training Putative motif shared by proteins that bind to dsRNA. 
At least some DSRM proteins seem to bind to specific RNA targets. Exemplified by Staufen, which is involved in 
localization of at least five different mRNAs in the eariy Drosophila embryo. Also by interferon-lnduced protein kinase 
45 in humans, which Is part of the cellular response to dsRNA. 
[0509] Number of members: 116 
[0510] 160. Dynamin family signature 

Dynamin [1.2] is a microtubule-associated force-producing protein of 100 Kd which is involved in the production of 
microtubule bundles and which is able to bind and hydrolyze GTR Dynamin is structurally related to the following 
50 proteins: - Drosophila shibire protein (gene shi) [3]. Shibire Is, very probably, the Drosophila cognate of mammalian 
dynamin. It seems to provide the motor for vesicular transport during endocytosis. - Yeast vacuolar sorting protein 
VPS1 (or SP015) [4], a protein which could also be involved in microtubule-associated motility. - Yeast protein MGM1 
[5], which is required for mitochondria! genome maintenance. - Yeast protein DNM1, which is involved in endocytosis. 

- Interferon induced Mx proteins [6,7]. Interferon alpha or beta induce the synthesis of a family of closely related proteins. 
55 Most of these proteins are known to confer resistance to influenza viruses and/or rhabdovlruses on transfected mam- 
malian cell in culture. The three motifs found in all GTP-binding proteins are located in the N-temiinal part of these 
proteins. The signature pattem that was developed forthese proteins Is based on a highly conserved region downstream 
of the ATP/GTP-binding motif 'A' (P-loop) (see <PDOCD0017 >).- 
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[0511] Consensus pattern: L-P-[RK]-G-[STNHGN]-tLIVM]-V-T-R- 

[ 1] Vallee R.B.. Shpetner H.S. Annu. Rev. Biochem. 59:909-932(1990). 

[ 2] Obar R.A., Collins C.A.. Hammarback J.A.. Shpetner H.S.. Vallee R.B. Nature 347:256-261(1990). 
5 [ 3) van der Bliek A., Meyerowitz E,M. Nature 351:411-414(1991). 

[ 4] Rothman J.H., Raymond C.K.. Gilbert T., O'Hara P.J.. Stevens T.H. Cell 61:1063-1074(1990) . 

[ 5) Jones B.A., Fangman W.L. Genes Dev. 6:380-389(1992). 

[ 6] Amheiter H., Meier E. New Biol. 2:851-857(1990). 

[ 7] Staeheli R, Pitossi R, Pavlovic J. Trends Cell Biol. 3:268-272(1993). 

10 

[0512] 161. (dynamin_2) Dynamin central region 

[0513] This region lies between the GTPase domain, see dynamin . and the pleckstrin homology (PH) domain. 
[0514] 162. E1-E2 ATPases phosphorylation site 

E1-E2 ATPases (also known as P-type) are cation transport ATPases which form an aspartyl phosphate intermediate 
<5 in the course of ATP hydrolysis. ATPases which belong to this family are listed below [1,2,3]. - Fungal and plant plasma 
membrane (H+) ATPases [reviewed in 4]. - Vertebrate (Na+, K+) ATPases (sodium pump) [reviewed in 5,6].-Gastric 
(K+, H+) ATPases (proton pump). - Calcium (Ca++) ATPases (calcium pump) from the sarcoplasmic reticulum (SR). 
the endoplasmic reticulum (ER) and the plasma membrane. - Copper (Cu++) ATPases (copper pump) which are in- 
volved in two human genetic disorders: Menkes syndrome and Wilson disease [7]. - Bacterial potassium (K+) ATPases. 
20 - Bacterial cadmium efflux (Cd++) ATPases [reviewed in 8]. - Bacterial magnesium (Mg++) ATPases. - A probable 
cation ATPase from Leishmania. - fixl, a probable cation ATPase from Rhizobium meliloti, involved in nitrogen fixation. 
The region around the phosphorylated aspartate residue is perfectly conserved in all these ATPases and can be used 
as a signature pattern. 

[0515] Consensus pattern: D-K-T-G-T-[LI]-[TI] [D is phosphorylated] 

25 

[ 1] Green N.M., McLennan D.H. Biochem. Soc. Trans. 17:819-822(1989). 
[2] Green N.M. Biochem. Soc. Trans. 17:970-972(1989). 
[ 3] Fagan M.J.; Saier M.H. Jr. J. Mol. Evol. 38:57-99(1994). 
[4] Sen-ano R. Biochim. Biophys. Acta 947:1-28(1988). 
30 [ 5] Fambrough D.M. Trends Neurosci. 11:325-328(1988). 

[ 6] Sweadner K.J. Biochim. Biophys. Acta 988:185-220(1989). 
[ 7] Bull PC. Cox D.W. Trends Genet. 10:246-251(1994). 

[ 8] Silver S.. Nucifora G., Chu L.. Misra T.K. Trends Biochem. Sci. 14:76-80(1989). 

35 [0516] 163. E1_N 

El Protein, N terminal domain 
Number of members: 90 

[0517] 164. (E1_dehydrog) Dehydrogenase El component 

[0518] This family uses thiamine pyrophosphate as a cofactor. This family includes pyruvate dehydrogenase, 2-ox- 
40 oglutarate dehydrogenase and 2-oxoisovalerate dehydrogenase. 
[0519] 165. (ECH) Enoyl-CoA hydratase/isomerase signature 

Enoyl-CoA hydratase (EC 4.2.1.17) (ECH) [1] and 3-2trans-enoyl-CoA isomerase(EC 5.3.3.8) (ECl) [2] are two en- 
zymes involved in fatty acid metabolism. ECH catalyzes the hydratation of 2-trans-enoyl-CoA into 3-hydroxyacyl-CoA 
and ECl shifts the 3- double bond of the intermediates of unsaturated fatty acid oxidation to the 2-trans position. Most 

45 eukaryotic cells have two fatty-acid beta-oxidation systems, one located in mitochondria and the other in peroxisomes. 
In mitochondria, ECH and ECl are separate yet structurally related monofunctional enzymes. Peroxisomes contain a 
trifunctional enzyme [3] consisting of an N-terminal domain that bears both ECH and ECl activity, and a C-terminal 
domain responsible for 3-hydroxyacyl-CoA dehydrogenase (HCDH) activity. In Escherichia coli (gene fadB) and Pseu- 
domonas fragi (gene faoA). ECH and ECl are also part of a multifunctional enzyme which contains both a HCDH and 

50 a3-hydroxybutyryl-CoA epimerase domain [4].A number of other proteins have been found to be evolutionary related 
to the ECH/ECI enzymes or domains: - 3-hydroxbutyryl-coa dehydratase (EC 4.2. 1.55 ) (crotonase), a bacterial enzyme 
involved in the butyrate/butanol-producing pathway. - Naphthoate synthase (EC 4.1.3.36) (DHNA synthetase) (gene 
menB) [5], a bacterial enzyme involved in the biosynthesis of menaqulnone (vitamin K2). DHNA synthetase converts 
O-succinyl-benzoyl-CoA (OSB-CoA) to 1,4-dihydroxy- 2-naphthoic acid (DHNA). - 4-chlorobenzoate dehalogenase 

®^ 3.8. 1.6 ) [6], a Pseudomonas enzyme which catalyzes the conversion of 4-chlorobenzoate-CoA to 4-hydroxyben- 

zoate-CoA. - A Rhodobacter capsulatus protein of unknown function (ORF257) [7]. - Bacillus subtilis putative polyketide 
biosynthesis proteins pksH and pksl. - Escherichia coli carnitine racemase (gene caiD) [8]. - Escherichia coli hypothet- 
ical protein ygfG. - Yeast hypothetical protein YDR036c.As a signature pattern for these enzymes, a conserved region 
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richin glycine and hydrophobic residues was selected. 

[0520] Consensus pattern: [LIVM]-[STAl-x-[LIVMl-[DENQRHSTA]-G-x(3)-[AG](3)-x(4)-[LIVMSTl-x-[CSTA] 
[LIVMFY]- 

5 [ 1] Minami-lshil N., Taketani S.. Osumi T., Hashimoto T. Eur. J. Biochem. 185:73-78(1989). 

[ 2] Mueller-Newen G., Stoffel W. Biol. Chem. Hoppe-Seyler 372:613-624(1991). 

[ 3] Palosaari P.M., Hiltunen J.K. J. Biol. Chem. 265:2446-2449(1990). 

[4] Nakahigashi K., Inokuchi H. Nucleic Acids Res. 18:4937-4937(1990). 

[ 5] Driscoll J.R., Taber H.W. J. Bacteriol. 174:5063-5071(1992). 
10 [6] Babbitt P.C., Kenyon G.L., Matin B.M., Charest H., Sylvestre M., Scholten J.D.. Chang K.-H., Liang P.-H., 

Dunaway-Mariano D. Biochemistry 31:5594-5604(1992). 

[7] Beckman D.L., Kranz R.G. Gene 107:171-172(1991). 

[ 8] Eichler K., Bourgis F.. Buchet A., Kleber H.-P., Mandrand-Berthelot M.-A. Mol. Microbiol. 13:775-786(1994). 

15 [0521] 166. (EF1 BD) Elongation factor 1 beta/betaVdelta chain signatures 

Eukaryotic elongation factor 1 (EF-1) is responsible for the GTP-dependent binding of aminoacyl-tRNAs to the ribos- 
omes [1]. EF-1 is composed of four subunits: the alpha chain which binds GTP and aminoacyl-tRNAs, the gamma 
chain that probably plays a role in anchoring the complex to other cellular components and the beta and delta (or beta) 
chains. The beta and delta chains are highly similar proteins that both stimulate the exchange of GDP bound to the 

20 alpha chain for GTP [2]. The beta and delta chains are hydrophilic proteins of around 23 to 31 Kd. Their C-terminal 
part seems important for the nucleotide exchange activity, while the N-terminal section is probably involved in the 
interaction with the gamma chain. Two signature patterns for this family of proteins were developed. The first corre- 
sponds to an acidic region in the central section; the second, to the C^erminal extremity of these proteins 
[0522] Consensus pattern: [DE]-[DEG]-[DE](2)-[LIVMF]-D-L-F-G- 

25 Consensus pattern: [IV]-Q-S-x-D-[LlVM]-x-A-[FWM]-[NQ]-K-ILIVM]- 

[ 1] Riis B., Rattan I.S., Clark B.F.C. Merrick W.C. Trends Biochem. Sci. 15:420-424(1990). 

[ 2] van Damme H.TF.. Amons R.. Karssies R.. Timmers C.J.. Janssen G.M.C.. Moeller W. Biochim. Biophys. Acta 

1050:241-247(1990). 

30 

[0523] 167. (EFIG.domain) Elongation factor 1 gamma, conserved domain 
[0524] 168. (EFG_C) Elongation factor G C-temriinus 

[0525] This family is always found associated with GTP_EFTU. This family includes the carboxyl terminal regions 
of Elongation factor G, elongation factor 2 and some tetracycline resistance proteins. 
35 [0526] 169. (EFP) Elongation factor P signature 

Elongation factor P (EF-P) [1] is a prokaryotic protein translation factor required for efficient peptide bond synthesis 
on 70S ribosomes from fMet-tRNAfMet. EF-P is a protein of 21 Kd. It is evolutionary related to yeiP. an hypothetical 
protein from Escherichia coli. As a signature pattern, a conserved region located in the C4emiinal part of these proteins 
was selected. 

40 [0527] Consensus pattern: K-x-[AV]-x(4)-G-x(2)-[LlV]-x-V-P-x(2)-[LIV]-x(2)-G- 
[ 1] Aoki H., Adams S.-L., Turner M.A., Ganoza M.C. Blochimie 79:7-11(1997). 
[0528] 170. (EF TS) Elongation factor Ts signatures 

In prokaryotes elongation factor Ts (EF-Ts) is a component of the elongation cycle of protein biosynthesis. It associates 
with the EF-Tu.GDP complex and induces the exchange of GDP to GTP. it remains bound to the aminoacyl-tRNA.EF- 
45 Tu.GTP complex up to the GTP hydrolysis stage on the ribosome [1].EF-Ts is also a component of the chloroplast 
protein biosynthetic machinery and is encoded in the genome of some algal chloroplast [2]. It is also present in mito- 
chondria [3]. As signature patterns for EF-Ts. two conserved regions located in the N-terminal part of the protein have 
been selected 

[0529] Consensus pattern: L-R-x(2)-T-[GSDNQ]-x-[GS]-[LIVMF]-x(0.1)-[DENKAC]-x-K-[KRNEQS]-A-L- 
50 [0530] Consensus pattern: E-[LIVM]-[NV]-[SCV]-[QE]-T-D-F-V-[SA1-[KRN]- 

[ 1] Bubunenko M.G.. Kireeva M.L.. Gudkov A.T. Blochimie 74:419-425(1992). 
[ 2] Kostrzewa M.. Zetsche K. Plant Mol. Biol. 23:67-76(1993). 

[ 3] Xin H.. Woriax V.L.. Burkhart W.A., Spremulli L.L. J. Biol. Chem. 270:17243-17249(1995) . 

55 

[0531] 171. (EMP24_GP25L) emp24/gp25Up24 family 

[0532] Members of this family are implicated in bringing cargo fonward from the ER and binding to coat proteins by 
their cytoplasmic domains. Number of members: 30 
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[0533] Paccaud JP, Thomas DY. Bergeron JJ. NHsson T. J Cell Biol 1998;140:751-765. 

172. ENV_polyprotein 

ENV polyprotein (coat polyprotein) 

Number of members: 224 

5 [0534] 173. (ERG4_ERG24) Ergosterol biosynthesis ERG4/ERG24 family signatures 

Two fungal enzymes involved in ergosterol biosynthesis and which act by reducing double bonds in precursors of 
ergosterol have been shown to be evolutionary related [1]. These are C-14 sterol reductase (gene ERG24 in budding 
yeast and ergS in Neurospora Crassa) and C-24(28) sterol reductase (gene ERG4 in budding yeast and sts1 in fission 
yeast). Their sequences are also highly related to that of chicken lamin B receptor, which is thought to anchor the 

10 lamina to the inner nuclear membrane. These proteins are highly hydrophobic and seem to contain seven or eight 
transmembrane regions. As signature patterns, two conserved regions were selected. The first one is apparently lo- 
cated in a loop between the fourth and fifth transmembrane regions and the second is in the C4erminal section. 
[0535] Consensus pattem: G-x(2)-[LIVM]-[YHl-D-x-[FYW]-x-G-x(2)-L-N-P-R- 
Consensus pattern: [LIVM)(2)-H-R-x(2)-R-D-x(3)-C-x(2)-K-Y-G- 

15 1 1] Lai M.H., Bard M., Pierson C.A., Alexander J.F.. Goebl M., Carter G.T., Kirsch D.R. Gene 140:41-49(1994). 
[0536] 174. (ERM) Ezrin/radixin/moesin family 

[0537] This family of proteins contain a band 4.1 domain (Band_41) . at their amino terminus. This family represents 
the rest of these proteins. 

[0538] [1] Yonemura S, Hirao M, Dol Y. Takahashi N, Kondo T. Tsukita S. J Cell Biol 1998;140:885-895. 

20 [0539] 175. ER lumen protein retaining receptor signatures 

Proteins that reside in the lumen of the endoplasmic reticulum (ER) contain aC-terminal tetrapeptide (generally K-D- 
E-L or H-D-E-L) that serves as a signal for their retrieval (retrograde transport) from subsequent compartments of the 
secretory pathway. The signal is recognized by a receptor molecule that is believed to cycle between the cis side of 
the Golgi apparatus and the ER [1].This protein is known as the ER lumen protein retaining receptor or also as the 

25 'KDEL receptor*. It has been characterized in a variety of species, including fungi (gene ERD2). plants. Plasmodium, 
Drosophila and mammals. In mammals two highly related forms of the receptor are known. Structurally, the receptor 
is a protein of about 220 residues that seems to contain seven transmembrane regions [2]. The N-terminal part (3 
residues) is oriented toward the lumen while the C-terminal tail (about 12 residues) is cytoplasmic. There are three 
lumenal and three cytoplasmic loops. Two signature patterns for these receptors were developed. The first pattern 

30 corresponds to the C-terminal half of the first cytoplasmic loop as well as most of the second transmembrane domain. 
The second pattern is a perfectly conserved decapeptide that corresponds to the central part of the fifth transmembrane 
domain. 

[0540] Consensus pattern: G-l-S-x-[KR)-x-Q-x-L-[FY]-x-[LIV](2)-F-x(2)-R-Y- 
Consensus pattem: L-E-[SA]-V-A-I-[LM]-P-Q-L- 

35 

1 1] Pelham H.R.B. Cun-. Opin. Cell Biol. 3:585-591(1991). 

[ 2] Townsley F.M., Wilson D.W., Pelham H.R.B. EMBO J. 12:2821-2829(1993), 

[0541] 176. (ETF_beta) Electron transfer flavoprotein beta-subunit signature 

40 The electron transfer flavoprotein (ETF) [1 ,2] serves as a specific electron acceptor for various mitochondria! dehydro- 
genases. ETF transfers electrons to the main respiratory chain via ETF-ubiquinone oxidoreductase. ETF is an het- 
erodimer that consist of an alpha and a beta subunit and which bind one molecule of FAD per dimer. A similar system 
also exists in some bacteria. The beta subunit of ETF is a protein of about 28 Kd which is structurally related to the 
bacterial nitrogen fixation protein fixA which could play a role in a redox process and feed electrons to ferredoxin. Other 

45 related proteins are: - Escherichia coli hypothetical protein ydlQ. - Escherichia coli hypothetical protein ygcRAs a 
signature pattem for these proteins, a conserved region which is located in the central section was selected. 
[0542] Consensus pattem: [IVA]-x-(KRl-x(2)-[DE]- [GD]-[GDE]-x(1 .2)-[EQ]-x-[LIV]- x(4)-P-x-[LIVM](2)-[TAC]- 

[ 1] Finocchiaro G.. Ikeda Y„ Ito M.. Tanaka K. Prog. Clin. Biol. Res. 321:637-652(1990). 
50 1 2] Tsai M.H.. Saier M.H. Jr. Res. Microbiol. 146:397-404(1995). 

[0543] 177. Endonuclease III signatures 

Escherichia coli endonuclease III (EC 4.2.99.18 ) (gene nth) [1] is a DNA repair enzyme that acts both as a DNA N- 
gtycosylase, removing oxidized pyrimidines from DNA, and as an apurinic/apyrimidinic (AP) endonuclease. introducing 
55 a single-strand nick at the site from which the damaged base was removed. Endonuclease III is an iron-sulfur protein 
that binds a single 4Fe-4Scluster. The 4Fe-4S cluster does not seem to be important for catalytic activity, but is probably 
involved in the proper positioning of the enzyme along the DNA strand [2]. Endonuclease lllis evolutionary related to 
the following proteins: - Fission yeast endonuclease Ml homolog (gene nthi) [3]. - Escherichia coli and related protein 
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DN A repair protein mutY, which is an adenine glycosylase. MutY is a larger protein (350 amino acids) than endonuclease 
III (211 amino acids). - Micrococcus luteus ultraviolet N-glycosylase/AP lyase which initiates repair at cis-syn pyrimidine 
dimers. - ORF10 in plasmid pFVI of the thermophilic archaebacteria IVIethanobacterium thermoformicicum [4]. Kesiric- 
tlon methylase m.MthTI, which is encoded by this plasmid. generates 5-methylcytosine which is subject to deamination 

5 resulting in G-T mismatches. This protein could correct these mismatches. - Yeast hypothetical protein YALOISc. - 
Fission yeast hypothetical protein SpAC26A3.02. - Caenorhabditis elegans hypothetical protein R10E4.5. - Methano- 
coccus jannaschii hypothetical protein MJ061 3.The 4Fe-4S cluster is bound by four cysteines which are all located in 
a ITamino acid region at the C-temninal end of endonuclease III. A similar region is also present in the central section 
of mutY and in the C-terminus of ORFIOand of the Micrococcus UV endonuclease. The 4Fe-4S cluster region does 

10 not exist in YAL015c. Two signature patterns for these proteins were developed: the first con-esponds to the core of 
the iron-sulfur binding domain, the second corresponds to the best conserved region in the catalytic core of these 
enzymes. 

[05441 Consensus pattern: C-x(3)-[KRS]-P-[KRAGL]-C-x(2)-C-x(5)-C [The four C's are 4Fe-4S ligands]- 
Consensus pattern: [GSTl-x-[LlVMn-P-x(5)-[L!VM\An-x(2.3)-[LI]-[PAS]-G-V-[GA]-x(3)-[GAC]-^ 

15 [LIVMFYW]-[GANK]- 

[ 1] Kuo C.-F.. McRee D., Fisher C.L., O'Handley S.F.. Cunnigham R.P.. Tainer J.A. Science 258:434-440(1992). 
[2] Thomson A.J. Cun". Biol. 3:173-174(1993). 

[ 3] Roldan-Arjona T. Anselmino C, Lindahl T. Nucleic Acids. Res. 3307-3312(1996). 
20 [ 4] Noelling J., van Eeden F.J.M.. Eggen R.I.L., de Vos W.M. Nucleic Acids Res. 20:6501-6507(1992). 

[0545] 178. (Epimerase) NAD dependent epimerase/dehydratase family 

[0546] This family of proteins utilize NAD as a cofactor. The proteins in this family use nucleotide-sugar substrates 
for a variety of chemical reactions. 
25 [0547] [1] Thoden JB, Hegeman AD, Wesenberg G, Chapeau MC. Frey PA, Holden HM, Biochemistry 1997;36: 
6294-6304. 

[0548] 1 79. Exonuclease 

[0549] This family includes a variety of exonuclease proteins, such as ribonuclease T and the epsilon subunit of DNA 
polymerase III. 

30 [0550] [1] Koonin EV, Deutscher MP, Nucleic Acids Res 1993;21:2521-2522. 
[0551] 180. ENTH 
ENTH domain 

[0552] [1] Kay BK, Yamabhai M, Wendland B. Emr SD; Medline: 99156083, Identification of a novel domain shared 
by putative components of the endocytic and cytoskeletal machinery. Protein Sci 1999;8:435-438. 
35 [0553] The ENTH (Epsin N-terminal homology) domain is found in proteins involved in endocytosis and cytoskeletal 
machinery. The function of the ENTH domain is unknown. 
[0554] Number of members: 29 

[0555] 181. (elF-IA) Eukaryotic initiation factor 1A signature 

Eukaryotic translation initiation factor 1 A (elF-1 A) [1] (fonneriy known aseiF-4C) is a protein that seems to be required 
40 for maximal rate of protein biosynthesis. It enhances ribosome dissociation into subunits and stabilizesthe binding of 
the initiator Met-tRNA to 40S ribosomal subunits.elF-IA is a hydrophilic protein of about 15 to 17 Kd. Archaebacteria 
also seem to possess a elF-1 A homolog. As a signature pattern, a conserved region in the central section of these 
proteins was selected. 

[0556] Consensus pattern: [IM]-x-G-x-[GS]-tKRH]-x(4)-[CL]-x-D-G-x(2)-R-x(2)-[RH]-l- x-G 
45 [0557] [ 1] Wei C.-L.. Kainuma M., Hershey J.W.B. J. Biol. Chem. 270:22788-22794(1995). 
[0558] 182. (elF-5A) Eukaryotic initiation factor 5A hypusine signature 

Eukaryotic initiation factor 5A (elF-5A) (formerly known as elF-4D) [1,2] is a small protein whose precise role in the 
initiation of protein synthesis is not known. It appears to promote the formation of the first peptide bond. elF-5Aseems 
to be the only eukaryotic protein to contain an hypusine residue. Hypusine is derived from lysine by the post-translation al 
50 addition of a butylamino group (from spermidine) to the epsilon-amino group of lysine. The hypusine group is essential 
to the function of elF-5A.A hypusine-containing protein has been found in archaebacteria such as Sulfolobus acido- 
caldarius or Methanococcus jannaschii; this protein is highlysimilar to elF-5A and could play a similar role in protein 
biosynthesis. The signature developed for elF-5A is centered around the hypusine residue. 
[0559] Consensus pattern: [PTJ-G-K-H-G-x-A-K [The first K is modified to hypusine] 

55 

[ 1] Pari< M.H., Wolff E.G., Folk J.E. Biofactors 4:95-104(1993). 

[2] Schnier J., Schwelberger H.G.. Smit-McBride Z.. Kang H.A., Hershey J.W.B. Mol. Cell. Biol. 11:3105-3114 
(1991). 
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[0560] 183. (efhand) S-100/ICaBP type calcium binding protein signature 

S-100 are small dimeric acidic calcium and zinc-binding proteins [1] abundant in the brain. They have two different 
types of calcium-binding sites: a low affinity one with a special structure and a 'nonmar EF-hand type high affinity site. 
The vitamin-D dependent intestinal calcium-binding proteins (ICaBP or calbtndin 9 Kd) also k>elong to this family of 
5 proteins, but it does not form dimers. In the past years the sequences of many new members of this family have been 
determined (for reviews see [2.3,4]); in most cases the function of these proteins is not yet known, although it is be- 
coming clearthat they are involved in cell growth and differentiation, cell cycle regulation and metabolic control. These 
proteins are: - 

Calcyclin (Prolactin receptor associated protein (PRA); ciatropin; 2a9; 5B10; S100A6). - Calpactin I light chain (p10; 

10 pll; 42c; S100A10). - Calgranulin A (cystic fibrosis antigen (CFAg); MIF related protein 8 (MRP- 8); p8; S100A8). - 
Calgranulin B (MIF related protein 14 (MRP-14); p14; S100A9). - Calgranulin C. - Calgizzarin (SIOOC). - Placental 
calcium-binding protein (CAPL) (18a2; peL98; 42a; p9K: MTS1; metastatin; S100A4). - Protein S-100D (S100A5). - 
Protein S-100E (S100A3). - Protein S-100L (CAN19; S100A2). - Placental protein S-100P (S100E). - Psoriasin 
(S100A7). - Chemotactic cytokine CP-10 [5]. - Protein MRP-126 [6]. - Trichohyalin [7]. This is a large intermediate 

15 filament-associated protein that associates with keratin intermediate filaments (KIF); it contains a S- 100 type domain 
in Its N-terminal extremity. A number of these proteins are known to bind calcium while others are not (p1 Ofor example). 
Our EF-hand detecting pattern will fail to pick those proteins which have lost their calcium-binding properties. A pattern 
was developed which unambiguously picks up proteins belonging to this family. This pattern spans the region of the 
EF-hand high affinity site but makes no assumptions on the calcium-binding properties of this site. 

20 [0561] Consensus pattern: [LIVMFYW](2)-x(2)-[LKl-D-x(3)-[DN]-x(3)-[DNSG]-[FY]-x- [ES]-[FYVC]-x(2)-ILIVMFS]- 
[LIVMF] 

[ 1] Baudier J. (In) Calcium and Calcium Binding proteins, Gerday C, Bollis L., Giller R., Eds., pp102-113, Springer 
Verlag, Berlin, (1988). 

25 [ 2] Moncrief N.D.. Kretsinger R.H.. Goodman M. J. Mol. Evol. 30:522-562(1990). 

[ 3] Kligman D.. Hilt D.C. Trends Biochem. Sci. 13:437^3(1988). 

[4] Schaefer B.W., Wicki R.. Engelkamp D.. Mattei M.-G.. Heizmann C.W. Genomics 25:638-643(1995). 
[ 5] Lackmann M., Cornish C.J., Simpson R.J., Moritz R.L.. Geczy C.L. J. Biol. Chem. 267:7499-7504(1992). 
[ 6] Nakano T, Graf T Oncogene 7:527-534(1992). 
30 [ 7] Lee S.-C. Kim l.-G., Marekov L.N., O'Keefe E.J.. Parry D.A.D., Steinert P.M., J. Biol. Chem. 268:12164-12176 

(1993). 

EF-hand calcium-binding domain 

Many calcium-binding proteins belong to the same evolutionary family and share a type of calcium-binding domain 
35 known as the EF-hand [1 to 5]. This type of domain consists of a twelve residue loop flanked on both side by a twelve 
residue alpha-helical domain. In an EF-hand loop the calcium ion is coordinated In a pentagonal bipyramidal configu- 
ration. The six residues involved in the binding are in positions 1, 3, 5, 7, 9 and 12; these residues are denoted by X, 
Y, Z, -Y, -X and -Z. The invariant Glu or Asp at position 12 provides two oxygens for liganding Ca (bidentate ligand). 
Listed below are the proteins which are known to contain EF-hand regions. For each type of protein the total number 
40 of EF-hand regions known or supposed to exist is indicated between parenthesis. This number does not Include regions 
which cleariy have lost their calcium-binding properties, or the atypical low-affinity site (which spans thirteen residues) 
found in the 8-100/ 
ICaBP family of proteins [6]. 



45 - Aequorin and Renilla luciferin binding protein (LBP) (Ca=3). 
Alpha actinin (Ca=2). - Calblndin (Ca=4). 

Calcineurin B subunit (protein phosphatase 2B regulatory subunit) (Ca=4). 

Calcium-binding protein from Streptomyces erythraeus (Ca=3?). 

Calcium-binding protein from Schistosoma mansoni (Ca=2?). 
50 - Calcium-binding proteins TCBP-23 and TCBP-25 from Tetrahymena thermophila (Ca=4?). - Calcium-dependent 

protein kinases (CDPK) from plants (Ca=4). 

Calcium vector protein from amphoxlus (Ca=2). 

Calcyphosin (thyroid protein p24) (Ca=4?). 

Calmodulin (Ca=4, except In yeast where Ca=3). 
55 . Calpain small and large chains (Ca=2). - Calretinin (Ca=6). 

Calcyclin (prolactin receptor associated protein) (Ca-2). 

Caltractin (centrin) (Ca-2 or 4). 

Cell Division Control protein 31 (gene CDC31) from yeast (Ca=2?). 
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Diacylglycerol kinase (EC 2.7.1 .107) (DGK) {Ca=2). 

FAD-dependent glycerol-3-phosphate dehydrogenase (EC 1.1.99.5) from mammals (Ca=1). - Fimbnn (plastm) 
(Ca=2).' 

Flagellar caicium-binding protein (1f8) from Trypanosoma caizi (Ca-1 or 2). 
Guanylate cyclase activating protein (GCAP) (Ca=3). 

Inositol phospholipid-specific phospholipase C isozymes gamma-1 and delta-1 (Ca=2) [10]. - Intestinal calcium- 
binding protein (ICaBPs) (Ca=2). 

MIF related proteins 8 (MRP-8 or CFAG) and 14 (MRP-14) (Ca=2). 
Myosin regulatory light chains (Ca=1). - Oncomodulin (Ca=2). 

Osteonectin (basement membrane protein BM-40) (SPARC) and proteins that contains an 'osteonectin* domain 
(QR1, matrix glycoprotein SCI) (see the entry <PDOC00535>) (Ca=1). - Parvalbumins alpha and beta (Ca=2). 
Placental calcium-binding protein (18a2) (nerve growth factor induced protein 42a) (p9k) (Ca=2). 
Recoverins (visinin, hippocalcin, neurocalcin, S-modulin) {Ca=2 to 3). 
Reticulocalbin (Ca=4). - S-100 protein, alpha and beta chains (Ca=2). 
Sarcoplasmic calcium-binding protein (SCPs) (Ca=2 to 3). 
Sea urchin proteins Spec 1 (Ca=4), Spec 2 (Ca=4?). Lps-1 (Ca=8). 

Serine/threonine protein phosphatase rdgc (EC 3.1.3.16) from Drosophila (Ca=2) - Sorcin V19 from hamster 
(Ca=2). - Spectrin alpha chain (Ca=2). 

Squidulin (optic lobe calcium-binding protein) from squid (Ca=4). 

Troponins C; from skeletal muscle (Ca=4), from cardiac muscle (Ca=3), from arthropods and molluscs (Ca-2). 

There has been a number of attempts [7,8] to develop patterns that pick-up EF-hand regions, but these studies were 
made a few years ago when not so many different families of calcium-binding proteins were known. Therefore a new 
pattern was developed which takes into account all published sequences. This pattern includes the complete EF-hand 
loop as well as the first residue which follows the loop and which seem to always be hydrophobic. 

- Consensus pattern: D-x-[DNS]-{ILVFYW}-[DENSTGl-[DNQGHRK]-{GP}-[LIVMC]-[DENQSTAGC]-x(2)-[DE]- 
[LIVMFYW] 

- Note: positions 1 (X), 3 (Y) and 12 (-Z) are the most conserved. 

- Note- the 6th residue in an EF-hand loop is, in most cases a Gly, but the number of exceptions to this 'rule* has 
gradually increased and therefore the pattern should include all the different residues which have been shown to 
exist in this position in functional Ca-binding sites. 

- Note: the pattern will, in some cases, miss one of the EF-hand regions in some proteins with multiple EF-hand 
domains. 

[ 1] Kawasaki H., Kretsinger R.H. Protein Prof. 2:305-490(1995).[ 2] Kretsinger R.H. Cold Spring Harbor Symp. 
Quant. Biol. 52:499-510(1987). 

[ 3] Moncrief N.D., Kretsinger R.H., Goodman M. J. Mol. Evol. 30:522-562(1990). 

[4] Nakayama S., Moncrief N.D., Kretsinger R.H. J. Mol. Evol. 34:416-448(1992). 

[ 5] Heizmann C.W.. Hunziker W. Trends Biochem. Sci. 16:98-103(1991). 

[6] Kligman D., Hilt D.C. Trends Biochem. Sci. 13:437-443(1988). 

[ 7] Strynadka N.C.J., James M.N.G. Annu. Rev. Biochem. 58:951-98(1989). 

[ 8] Haiech J.. Sallantin J. Biochimie 67:555-560(1985). 

[ 9] Chauvaux S.. Beguin R. Aubert J.-R, Bhat K.M., Gow L.A., Wood TM.. Bairoch A. Biochem. J. 265:261-265 
(1990). 

[10] Bairoch A.. Cox J.A. FEBS Lett. 269:454-456(1990). 
[0562] 184. Fnolase signature 

Enolase (EC 4.2.1.11 ) is a glycolytic enzyme that catalyzes the dehydration of2-phospho-D-gtycerate to phosphoe- 
nolpyruvate [1]. It is a dimeric enzyme that requires magnesium both for catalysis and stabilizing the dimer. Enolase 
is probably found in all organisms that metabolize sugars. In vertebrates, there are three different tissue-specific iso- 
zymes" alpha present in most tissues, beta in muscles and gamma found only in nervous tissues. Tau-crystallin, one 
of the major lens proteins in some fish, reptiles and birds, has been shown [2] to be evolutionary related to enolase. 
As a signature pattern for enolase. the best conserved region was selected, it is located in the C4erminal third of the 
sequence.- 

[0563] Consensus pattem: [LlV]{3)-K-x-N-Q-l-G-[ST|-[UV]-[ST|-[DE]-[STA] 
[ 1] Lebioda L., Stec B., Brewer J.M. J. Biol. Chem. 264:3685-3693(1989). 
[ 2] Wistow G.. Piattigorsky J. Science 236:1554-1556(1987). 
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[0564] 185. (F-actin_cap_A) F-actin capping protein alpha subunit signatures 

The F-actin capping protein binds In a calcium-independent manner to the fast growing ends of actin filaments (barbed 
end) thereby blocking the exchange of subunits at these ends. Unlike gelsolin and severin this protein does not sever 
actin filaments. The F-actin capping protein is a heterodimer composed of two unrelated subunits: alpha and beta.The 
5 alpha subunit is a protein of about 268 to 286 amino add residues whose sequence is well conserved in eukaryotic 
species [1]. As signature patterns two highly conserved regions in the C-terminal section of the alpha subunit were 
selected. 

[0565] Consensus pattern: V-H-[FY](2)-E-D-G-N-V 
Consensus pattern: F-K-[AE]-L-R-R-x-L-P- 
10 [0566] [ 1] Cooper J.A., Caldwell J.E.. Gattermeir D.J.. Tonnes M.A.. Amatruda J.F., Casella J.F. Cell Motil. Cytoskel- 
eton 18:204-214(1991). 
[0567] 186. F-box domain 

[0568] [1] Bai C, Sen P. Hofmann K, Ma L, Goebl M. Harper JW, Elledge SJ. Cell 1996;86:263-274. [21 Skowyra D, 
Craig KL, Tyers M. Elledge SJ, Harper JW. Cell 1997:91:209-219. 
15 [0569] 187. F-protein 

Negative factor, (F Protein) or Nef. 

[0570] [1] Arold S, Franken P, Stnjb M-P, Hoh F, Benichou S, Benarous R. Dumas C; Medline: 98035457, The crystal 
structure of HIV-1 Nef protein bound to the Fyn kinase SH3 domain suggests a role for this complex in altered T cell 
receptor signalling Structure 1997;5:1361-1372. 
20 [0571] Nef protein accelerates virulent progression of AIDS by its interaction with cellular proteins involved in signal 
transduction and host cell activation. Nef has been shown to bind specifically to a subset of the Src kinase family. 
[0572] Number of members: 1013 
[0573] 188. (FAD_binding_2) 

Fumarate reductase / succinate dehydrogenase FAD-binding site In bacteria two distinct, membrane-bound, enzyme 
25 complexes are responsible for the interconversion of fumarate and succinate (EC 1 .3.99.1 ): fumarate reductase (Frd) 
is used in anaerobic growth, and succinate dehydrogenase (Sdh) is used in aerobic growth. Both complexes consist 
of two main components: a membrane-extrinsic component composed of a FAD-binding flavoprotein and an iron-sulfur 
protein; and an hydrophobic component composed of a membrane anchor protein and/or a cytochrome B. 
[0574] In eukaryotes mitochondrial succinate dehydrogenase (ubiquinone) (EC 1.3.5.1) is an enzyme composed of 
30 two subunits: a FAD flavoprotein and and Iron-sulfur protein. 

[0575] The flavoprotein subunit Is a protein of about 60 to 70 Kd to which FAD Is covalently bound to a histldlne 
residue which is located in the N-terminal section of the protein [1]. The sequence around that histidine is well conserved 
in Frd and Sdh from various bacterial and eukaryotic species [2] and can be used as a signature pattem. 
[0576] Consensus patternR-[ST)-H-[ST|-x(2)-A-x-G-G [H is the FAD binding site] Sequences known to belong to this 
35 class detected by the pattern ALL. 

[ 1] Blaut M., Whittaker K., Valdovinos A.. Ackrell B.A., Gunsalus R.P. CecchinI G. J. Biol. Chem. 264:13599-1 3604 
(1989). 

[ 2] Birch-Machin M.A., Farnsworth L., Ackrel! B.A.. Cochran B., Jackson S., Bindoff L.A., Aitken A., Diamond A. 
40 G.. Tumbull D.M. J. Biol. Chem. 267:11553-11558(1992). 

[0577] 1 89. Fatty acid desaturases signatures (FA_desaturase) 

Fatty acid desaturases (EC 1.14.99.-) are enzymes that catalyze the insertion of a double bond at the delta position 
of fatty acids. There seems to be two distinct families of fatty acid desaturases which do not seem to be evolutionary 

45 related. Family 1 Is composed of: - Stearoyl-CoA desaturase (SCD) (EC 1.14.99.5 ) [1]. SCD Is a key regulatory enzyme 
of unsaturated fatty acid biosynthesis. SCD introduces a cIs double bond at the delta(9) position of fatty acyl-CoA's 
such as palmitoleoyi- and oleoyl-CoA. SCD Is a membrane-bound enzyme that is thought to function as a part of a 
multienzyme complex in the endoplasmic reticulum of vertebrates and fungi. As a signature pattern for this family a 
conserved region in the C-terminal part of these enzymes was selected, this region is rich in histidine residues and in 

50 aromatic residues. Family 2 is composed of: - Plants stearoyl-acyl-carrier-protein desaturase (EC 1.14.99.6) [2], these 
enzymes catalyze the introduction of a double bond at the delta(9) position of steraoyl-ACP to produce oleoyl-ACP. 
This enzyme is responsible for the conversion of saturated fatty acids to unsaturated fatty acids in the synthesis of 
vegetable oils. - Cyanobacteria desA [3] an enzyme that can introduce a second cIs double bond at the delta(12) 
position of fatty acid bound to membranes glycerolipids. DesA Is involved in chilling tolerance; the phase transition 

55 temperature of lipids of cellular membranes being dependent on the degree of unsaturation of fatty acids of the mem- 
brane lipids. As a signature pattem for this family a conserved region in the C-terminal part of these enzymes was 
selected. 

[0578] Consensus pattem: G-E-x-[FY]-H-N-[FY]-H-H-x-F-P-x-D-Y- 
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Consensus pattern: [STV[SA]-x(3HQR]-[Lll-x(5.6)-D-Y-x(2)-[LIVMI^-[LIVM]-[DE]- 

[ 11 Kaestner K.H., Ntambi J.M.. Kelly T.J. Jr.. Lane M.D. J. Biol. Chem. 264:14755-14761(1989). 
[ 2] Shanklln J., Somervllle C.R. Proc. Natl. Acad. Sci. U.S.A. 88:2510-2514(1991). 
[ 3] Wada H., Gombos Z., Murata N. Nature 347:200-203(1990). 

105791 190. Fructose-1-6-bisphosphatase active site (FBPase) 

Fructose-1 6-bisphosphatase (EC 3.1.3.11) (FBPase) [1], a regulatory enzyme in glaconeogenesis. catalj^es the hy- 
drolysis of fructose 1,6-bisphosphate to fructose 6-phosphate. It is involved in many different metabolic pathways and 
found in mosto,ganlsms.Sedoheptulose-1.7-bisphosphatase (EC 3.1 .3.37) (SBPase) [2] V-^^/"""^ P'^"* 

chloroplast and in photosynthetic bacteria that catalyzes the hydrolysis of sedoheptulose ^ •^-'''SP^^SP':^*^ ° ^^^^^ 
heptulose 7-phosphate, a step in the Calvin's reductive pentose phosphate cycle. It .s functionally and structurally 
related to FBPase. In mammalian FBPase. a lysine residue has been shown to be involved in the^alytic "\echan.sm 
[31 The region around this residue is highly conserved and can be used as a signature pattern for FBPase and SBPase. 
It must be noted that, in some bacterial FBPase sequences, the active site lysine is replaced by an arginine 
Consensus pattern: [AG]-(RK]-L-x(1,2)-[LIV]-[FY]-E-x(2)-P-ILIVM]-[GSA] [K/R is the active site residue]- 

r 11 Benkovic S.J.. DeMaine M.M. Adv. Enzymol. 53:45-82(1982). 
2 Raines C.A.. Lloyd J.C.. Willingham N.M.. Potts S.. Dyer TA Eur. J. Biochem. 205:1053-1059(1992). 
[ 3] Ke H.. Thorpe CM.. Seaton B.A.. Lipscomb W.N., Marcus F. J. Mol. Biol. 212:513-539(1989). 

r05801 1 91 . FGGY family of carbohydrate kinases signatures * 

t has been shown [1) that four different type of carbohydrate kinases seem to be evolutionary related These en^mes 
are: - L-fucolokinasJ (EC 2.7.1.51) (gene fucK). - Gluconokinase (EC 2X1J2) (gene ^'^^^-'f^^''^';^^^^^ 
2 7.1.30 ) (gene gIpK). - XyM^e (EC 2.7.1.17 ) (gene xylB). - L-xylulose kinase (EC 2^155) <f "^^ ^ 
iife^s are proteins of from 480 to 520 amino acid residues. As consensus patterns for this family of kinases two 
conserved regionswere selected, one in the cenfral section, the other in the C-terminal section 
[0581] consensus pattem: lMFYGS]-x-[PSTl-x(2)-K-[LlVMFYWl-x-W-lLIVMF]-x-[DENQ^^^^^^ 
Lnsensus pattern: lGSA]-x-[LIVMFYVV]-x-G-[LIVM]-x(7.8)-[HDENQ]-[LIVMF]-x(2)-[AS]-[^^^^^^^ 
[0582] [ 1] Reizer A, Deutscher J., Saier M.H. Jr.. Reizer J. Mol. Microbiol. 5:1081-1089(1 991). 
[0583] 192. FKBP-typepeptidyl-prolylcis-fransisomerase signatures/profile (FKBP) ,»„^KiKite 
FKBP [1 2 31 is the major high-affinity binding protein, in vertebrates, for the immunosuppressive drug FK506. It exhibits 
peptldyl-profyl cis-trans isomerase activity (EC 5JX8) (PPIase or rotamase). PPIase is an ^^^^^ '^^t 
protein folding by catalyzing the cis-trans isomerization of proline imidic peptide bonds in ° .'9°P^P.''<^^^ ./^l -^Vhofh 
thTee differen? fonns of FKBP are known in mammalian species: - FKBP-12. which is cy^osolic and inhibited by both 
FK506 and rapamycin. - FKBP-13. which Is membrane associated and inhibited by both FK506 and raparriycin. - FKBP- 
25, which is preferentially inhibited by rapamycin. These fomis of FKBP are evolutionary '^'^'^'^J^'^^^^J^^^^^Z . 
similarities[5 6.7] with the following proteins: - Fungal FKBR - Mammalian hsp binding immunophi in (HBI) (a'^o ^ 
p59) HBI is a protein which binds to hsp90 and contains two FKBP-like domains in its N- terminal sechon - the firs of 
which seems to be functional. -The C-terminal pari of the cell-surface protein mipfrom Legionella;a P™**'"^^"*"^*^ 
with macrophage Infection by an unknown mechanism. - Escherichia coli slyD [8], a Protein wrth ^ "J^^^ "JJL*?^ 
domain ft,llowed by an histidine-rich metal-binding domain. - Escherichia coli fkpA. - Escherichia col. ^'^ (PKBP22)^ 
- Escherichia coli sIpA. - Bacterial trigger factor (Tig). - Streptomyces hygroscopus and chrysomaHus FK506-b.nd.ng 
protein - Chlamydia frachomatis 27 Kd membrane protein. - Neisseria meningitidis sfrain C114 PPiase. - Probable 
PpKes from Haemophilus Influenzae (HI0754). Methanococcus jannaschii (MJ0278 and MJ0825) Pseudomonas 
fluorescens and Pseudomonase aeruginosa. Two signature patterns for ttiese P^^ins were developed^ One sto^^^ 
on a conserved region in the N-terminus of FKBP. the other Is located in the cenfral section. The profile for FKBP spans 

ISlS'^^Consen'Sus pattern: [LIVMC]-x-[Yn-x-(GVL]-x(1 .2)-lLFT]-x(2)-G-x(3)-[DEHSTAEQK]-[S^^^^^^^^^ 

[0585] Consensus pattern: [LIVMFY]-x(2)-[GAl-x(3.4)-[LIVMF]-x(2)-[LIVMFHK]-x(2)-G- x(4)-[LIVMFl-x(3)-[PS- 

GAQl-x(2)-IAG]-[FY]-G- 

[ 1] Tropschug M., Wachter E.. Mayer S.. Schoenbrunner E.R., Schmid F.X. Nature 346:674-677(1990). 
[2)SteinR.L.Curr. Biol. 1:234-236(1991). e ki u i Ri^i 

[ 3] Siekieri^a J.J., Widerrecht G., Greulich H.. Boulton D.. Hung S.H.Y.. Cryan J.. Hodges RJ.. Sigal N.H. J. Biol. 
Chem. 265:21011-21015(1990). 

[4] Fischer G.. Schmid F.X. Biochemisfry 29:2205-2212(1990). 

[ 5] Trandinh C.C.. Pao G.M.. Saier M.H. Jr. FASEB J. 6:3410-3420(1992). 
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[ 6] Galat A. Eur. J. Biochem. 216:689-707(1993). 

[ 7] Hacker J., Fischer G, Mol. Microbiol. 10:445456(1993). 

[8] Wuelfing C. Lomardero J.. Ptueckthun A. J. Biol. Chem. 269:2895-2901(1994). 

5 [0586] 1 93. MAPEG family (aka: FLAP/GST2/LTC4S family signature) 
[0587] The following mammalian proteins are evolutionary related [1]: 

Leukotriene C4 synthase (EC 2.5. 1 .37) (gene LTC4S), an enzyme that catalyzes the production of LTC4 from LTA4. 
Microsomal glutathione S-transferase II (EC 2.5.1.18) (GST-II) (gene GST2), an enzyme that can also produces 
10 LTC4 fron LTA4. 

5-lipoxygenase activating protein (gene FLAP), a protein that seems to be required for the activation of 5-lipoxy- 
genase. 

[0588] These are proteins of 1 50 to 160 residues that contain three transmembrane segments. As a signature pattern, 
IS a conserved region between the first and second transmembrane domains was selected. 
[0589] Consensus pattern: G-x(3)-F-E-R-V-[FY]-x-A-[NQ]-x-N-C 

[0590] [1] Jakobsson P.-J., Mancini J.A.. Ford-Hutchinson A.W. J. Biol. Chem. 271 :22203-222 10(1 996). 
[0591] 194. FMN-dependent alpha-hydroxy acid dehydrogenases active site (FMN_dh) 

A number of oxidoreductases that act on alpha-hydroxy acids and which are FMN-containing flavoproteins have been 
20 shown [1,2,3] to be structurally related; these enzymes are: - Lactate dehydrogenase (EC 1.1.2.3) . which consists of 
a dehydrogenase domain and a heme-binding domain called cytochrome b2 and which catalyzes the conversion of 
lactate into pyruvate. - Glycolate oxidase (EC 1.1.3.15 ) ((S)-2-hydroxy-acid oxidase), a peroxisomal enzyme that cat- 
alyzes the conversion of glycolate and oxygen to glyoxylate and hydrogen peroxide: - Long chain alpha-hydroxy acid 
oxidase from rat (EC 1.1.3.15 ). a peroxisomal enzyme. - Lactate 2-monooxygenase (EC 1.13.12.4 ) (lactate oxidase) 
25 from Mycobacterium smegmatis, which catalyzes the conversion of lactate and oxygen to acetate, carbon dioxide and 
water. - (S)-mandelate dehydrogenase from Pseudomonas putida (gene mdlB). which catalyzes the reduction of (S)- 
mandelate to benzoylformate. The first step in the reaction mechanism of these enzymes is the abstraction of the 
proton from the alpha-carbon of the substrate producing a carbanion which can subsequently attach to the N5 atom 
of FMN. A conserved histidine has been shown [4] to be involved in the removal of the proton. The region around this 
30 active site residue is highly conserved and contains an arginine residue which is involved in substrate binding. 
[0592] Consensus pattern: S-N-H-G-[AG]-R-Q [H is the active site residue] [R is a substrate-binding residue]- 

[ 1] Giegel D.A.. Williams C.H. Jr., Massey V. J. Biol. Chem. 265:6626-6632(1990). 

1 2] Tsou A.Y., Ransom S.C.. Gerit J.A.. Buechter D.D., Babbitt PC. Kenyon G.L. Biochemistry 29:9856-9862 
35 (1990). 

[ 3] Le K.H.D., Lederer F. J. Biol. Chem. 266:20877-20880(1991). 
[4] Lindqvist Y., Branden C.-l. J. Biol. Chem. 264:3624-3628(1989). 

[0593] 195. Flavin-binding monooxygenase-like (FMO-like) 
40 [0594] This family includes FMO proteins, cyclohexanone monooxygenase 
[0595] 196. (FPGS) 

Folylpolyglutamate synthase signatures (aka Mur.ligase) 

[0596] Folylpolyglutamate synthase (EC 6.3.2.17) (FPGS) [1] is the enzyme of folate metabolism that catalyzes ATP- 
dependent addition of glutamate moieties to tetrahydrofoiate. 
-^5 [0597] Its sequence is moderately conserved between prokaryotes (gene folC) and eukaryotes. We developed two 
signature patterns based on the conserved regions which are rich in glycine residues and could play a role in the 
catalytical activity and/or in substrate binding. 

[0598] Consensus pattern (LIVMFYl-x-[LIVM]-[STAG]-G-T-[NK]-G-K-x-[STl-x(7)- [LIVM](2)-x(3)-[GSK] Sequences 
known to belong to this class detected by the pattern ALL. 
50 [0599] Consensus pattem[LIVMFY](2)-E-x-G-[LIVM]-[GA]-G-x(2)-D-x-[GST|-x-[LIVM)(2) Sequences known to be- 
long to this class detected by the pattem ALL. 

[0600] [ 1] Shane B.. Ga^owT, Brenner A.. Chen L.. Choi Y.J.. Hsu J.C., Stover P. Adv. Exp. Med. Biol. 338:629-634 
(1993). 

[0601 ] 1 97. FYVE zinc finger 
55 [0602] The FYVE zinc finger is named after four proteins that it has been found in: Fabl. YOTB/ZK632.12, Vaci, 
and EEA1. The FYVE finger has been shown to bind two 2n++ ions [1]. The FYVE finger has eight potential zinc 
coordinating cysteine posrtions. Many members of this family also include two histidines In a motif R+HHC+XCG, where 
represents a charged residue and X any residue. Members were included which do not conserve these histidine 
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'^T R. To^ BH. D-Arrigo A. J Biol Chen, 1996:271:24048-24054. 12, GauUier JM. S- 

monsen A. D'Arrigo A. Bremnes B. Stenmark H. Aasland R, Nature 1998;394:432-433. 
[0604] 198. F_actin_cap_B 

F-actin capping protein beta subunit signature ►i„fii,„„„»c 

r06051 The F-actin capping protein binds in a calcium-independent manner to the fast growing ends of actin filamente 
Kd end) thereby blocking the exchange of subunits at these ends. Unlike ge.solin and f "j"'^,^^^^^^^ 
not sever actin filaments. The F-actin capping protein is a heterodimer composed of two unrelated subunrts. alpha and 

fo1o6] Thebetasubunitisaproteinofabout280aminoacidresidueswhosesequenceiswellconservedin^^^^^^ 
spSs [11. AS a Signature pattern a conserved hexapeptide in the N-tem,inal sect on of '^^J^^^^^^f r"™!^^^'^'*- 
[0607] Consensus pattern: C-D-Y-N-R-D Sequences known to belong to this class ^^tec^ed by the pattern ALL. 
[0608] [1] Amatmda J.F., Cannon J.F., Tatchell K., Hug C. Cooper J.A. Nature 344:352-354(1990). 
r06091 199. IsopenicillinN synthetase signatures (Fe_Asc_oxidored) ._,„.K^„,«.e 
sooenicillin N synthetase (IPNS) [1 .2] is a key enzyme in the biosynthesis of penicillin and cephalosponn In the pres- 
InceToSgen it removes iron and iscorbate. four hydrogen atoms from L-(alpha-aminoadipyl)-L-cyste.nyl-d-val.ne 
tofor^ J,eSi3i"one and thiazo.idine rings of isopenicillin. IPNS is an enzyme -Z::^Z"'^^^or 
Two cysteines are conserved in fungal and bacterial IPNS sequences; these may be involved ,n ''°''*'2T,'^!f^Z 
irsSe-binding. Cephatosporium acremonium DAOCS/DACS [3] is a bifunctional enzyme involved in cepha osporin 
btosySresis The DAOCS domain, which is structurally related to IPNS. catalyzes the step from penic.ll.n N to deac- 
S-ephalo^^^^^^^^^ C - used as a substrate by DACS to form deacetylcephalosponn Streptomycesclav^^^^^^^^^^^ 
possesses a monofunctional DAOCS enzyme (gene cefE) [4] also related to IPNS. Two signature patterns for these 
enzymes were derived, centered around the conserved cysteine residues. 
[0610] Consensus pattern: [RK]-x-[STA]-x(2)-S-x-C-Y-[SL]- 
25 Consensus pattern: [LIVM](2)-x-C-G-[STA]-x(2)-[STAG]-x(2)-T-x-[DNGl- 

[1] Martin J.F. Trends Biotechnol. 5:306-308(1987). ^.^Moon^ 
21 Chen G Shiffman D.. Mevarech M.. AharonowitzY. Trends Biotechnol 8:105-111(1990). 

1 3] Samson S M^otzlaf J.E.. Slisz M.L.. Becker G.W., van Frank R.M.. Veal L.E.. Yeh W.K.. Miller J.R.. Queener 
30 S.W..lngoliaT.D.Bion-echnology 5:1207-1214(1987). • , 7RnMQ«Q\ 

[ 4] Kovacevic S.. Weigel B.J.. Tobin M.B.. Ingolia T.D.. Miller J.R. J. Bactenol. 171:754-760(1989). 

'SLnTl^'rciCnS a nucleolar small nuclear ribonuc,eoprotein(SnRNP) particle thought to participate in 
thffiSp onL processing of pre-rRNA. In mammals, fibrillarin is associated with the U3, U8 and U1 3sma I nuclear 
RNM ?Fibri llarin is an exJemely well conserved protein of about 320 amino acid residues. Structurally it ^nsist 
of threl iifferent domains: - An N-temiinal domain of about 80 amino acids which is very nch 9'y«"^^"<^^«'"^^^ 
a number of dimethylated arginine residues (DMA). - A central domain of about 90 residues whwh resembles that of 
RNA ending proteins and contains an octameric sequence similar to the RNP-2 consensus found in such prote.ns._ 
A C-tem,inal alpha-helical domain. A protein evolutionary related to fibrillarin has t!L;",„^;^^^^^^^^ 
such as Methanococcus vannielii or voltae. This protein (geneflpA) is involved in pre-rRNA processing. It lacks the 
Gi?AjnchN-temiinaldomain.Asasignaturepattern.aregionwasselected that starts wrthandencompasestheR 

2 like octapeptide sequence. „ „ ,r^^n 

[0612] Consensus pattern: [GSTHLIVMAP]-V-Y-A-[IV]-E-[FY]-ISA]-x-R-x(2)-R-[DE]- 

[ 1] Aris J.P.. Blobel G. Proc. Natl. Acad. Sci. U.S.A. 88:931-935(1991). 
[ 2] Bandziulis R.J., Swanson M.S., Dreyfuss G. Genes Dev. 3:431-437(1989). 
[3] Agha-Amiri K. J. Bacteriol. 176:2124-2127(1994). 

50 [0613] 201. Filamin/ABP280 repeat .„.,..„„-,„ ooo o->n 

[0614] [1] Fucini P. Renner C. Herberhold C. Noegel AA. Holak TA. Nat Struct Biol 1997;4:223-230. 

mm Sfs Ta^rSHy of^ui J^^nsferases are the enzymes transfening fucose from GDP-Fucose to GlcNAc in an 

alphal 3 linkage [1]. 
55 [0617] [1] Breton C, Oriol R, Imberty A; Glycobiology 1998;8:87-94. 

r0618l 203. 2Fe-2S ferredoxins. iron-sulfur binding region signature (fer2A) 

Ferredoxins [11 are a group of iron-sulfur proteins which mediate electron transfer in a wide vanety of metebolic reao- 
SnrFr^oxins can be divided into several subgroups depending upon the physiological nature of the iron sulfur 
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cluster(s) and according to sequence similarities. One of these subgroups are the 2Fe-2S ferredoxins. which are pro- 
teins or domains of around one hundred amino acid residues that bind a single 2Fe-2S iron>sulfur cluster. The proteins 
that are known [2] to belong to this family are listed below. - Ferredoxin from photosynthetic organisms; namely plants 
and algae where it is located in the chloroplast or cyanetle; and cyanobacteria. - Ferredoxin from archaebacteria of 

5 the Halobacterium genus. - Ferredoxin IV (gene pftA) and V (gene fdxD) from Rhodobacter capsulatus. - Fen-edoxin 
in the toluene degradation operon (gene xylT) and naphthalene degradation operon (gene nahT) of Pseudomonas 
putida. - Hypothetical Escherichia coli protein yfaE. - The N-terminal domain of the bifunctional ferredoxin/ferredoxin 
reductase electron transfer component of the benzoate 1 ,2-dioxygenase complex (gene benC) from Acinetobacter 
calcoaceticus, the toluene 4-monooxygenase complex (gene tmoF), the toluate 1,2-dioxygenase system (gene xylZ), 

10 and the xylene monooxygenase system (gene xylA) from Pseudomonas. - The N-terminal domain of phenol hydrox- 
ylase protein p5 (gene dmpP) from Pseudomonas Putida. - The N-terminal domain of methane monooxygenase com- 
ponent C (gene mmoC) from Methytococcus capsulatus . - The C-terminal domain of the vanillate degradation pathway 
protein vanB in a Pseudomonas species. - The N-tenminal domain of bacterial fumarate reductase iron-sulfur protein 
(gene frdB). - The N-terminal domain of CDP-6-deoxy-3,4-glucoseen reductase (gene ascD) from Yersinia pseudotu- 

15 berculosis. - The central domain of eukaryotic succinate dehydrogenase (ubiquinone) iron- sulfur protein. - The N- 
terminal domain of eukaryotic xanthine dehydrogenase. - The N-terminal domain of eukaryotic aldehyde oxidase. In 
the 2Fe-2S ferredoxins, four cysteine residues bind the iron-sulfur cluster. Three of these cysteines are clustered 
together in the same region of the protein. Our signature pattern spans that iron-sulfur binding region. 
[0619] Consensus pattern: C-{CHC}-[GA]-{C}-C-[GAST]-{CPDEKRHFYW}-C [The three C*s are 2Fe-2S ligands]- 

20 1 1J Meyer J. Trends Ecol. Evol. 3:222-226(1 988 ).[ 2] Harayama S., Polissi A.. Rekik M. FEBS Lett. 285:85-88(1991). 
[0620] Adrenodoxin family, iron-sulfur binding region signature (fer2B) 

Ferredoxins [1] are a group of iron-sulfur proteins which mediate electron transfer in a wide variety of metabolic reac- 
tions. Ferredoxins can be divided into several subgroups depending upon the physiological nature of the iron sulfur 
cluster(s) and according to sequence similarities. One family of ferredoxins groups together the following proteins that 

25 all bind a single 2Fe-2S iron-sulfur cluster: - Adrenodoxin (ADX) (adrenal ferredoxin), a vertebrate mitochondrial protein 
which transfers electrons from adrenodoxin reductase to cytochrome P450scc. which is involved in cholesterol side 
chain cleavage. - Puttdaredoxin (PTX), a Pseudomonas putida protein which transfers electrons from putidaredoxin 
reductase to cytochrome P450-cam, which is involved in the oxidation of camphor. - Terpredoxin [2], a Pseudomonas 
protein which transfers electrons from terpredoxin reductase to cytochrome 

30 P450-terp, which is involved in the oxidation of alpha-terpineol. - Rhodocoxin [3], a Rhodococcus protein which transfers 
electrons from rhodocoxin reductase to cytochrome CYP116 (thcB), which is involved in the degradation of thiocar- 
bamate herbicides. - Escherichia coli ferredoxin (gene fdx) [4] whose exact function is not yet known. - Rhodobacter 
capsulatus ferredoxin VI [5]. which may transfer electrons to a yet uncharacterized oxygenase. - Caulobacter crescen- 
tus ferredoxin (gene fdxB) [6].ln these proteins, four cysteine residues bind the iron-sulfur cluster. Three of these 

35 cysteines are clustered together in the same region of the protein. Our signature pattern spans that iron-sulfur binding 
region. 

[0621] Consensus pattern: C-x(2)-[STAQ]-x-[STAMV]-C-[STA}-T-C-[HR] [The three C's are 2Fe-2S ligands]- 

[ 1] Meyer J. Trends Ecol. Evol. 3:222-226(1988). 
40 [ 2] Peterson J.A., Lu J.-Y., Geisselsoder J., Graham-Lorence S., Canmona C, Witney F.. Lorence M.C. J. Biol. 

Chem. 267:14193-14203(1992). 

[ 3] Nagy I., Schoofs G., Compemolle F., Proost P.. Vanderieyden J., De Mot R. J. Bacteriol. 177:676-687(1995). 
[4] TaD.T, Vickery L.E. J. Biol. Chem. 267:11120-11125(1992). 

[ 5] Naud i., Vincon M., Garin J.. Gaillard J., Forest E.. Jouanneau Y. Eur. J. Biochem. 222:933-939(1994). 
45 [ 6] Amemiya K EMBUGenbank: X51607. 

[0622] 204. 4Fe-4S ferredoxins. iron-sulfur binding region signature (fer4) 

Ferredoxins [1] are a group of iron-sulfur proteins which mediate electron transfer in a wide variety of metabolic reac- 
tions. Ferredoxins can be divided into several subgroups depending upon the physiological nature of the iron-sulfur 

50 cluster(s). One of these subgroups are the 4Fe-4S ferredoxins, which are found in bacteria and which are thus often 
referred as *bacterial-type' ferredoxins. The structure of these proteins [2] consists of the duplication of a domain of 
twenty six amino acid residues; each of these domains contains four cysteine residues that bind to a 4Fe^S center. 
A number of proteins have been found [3] that include one or more 4Fe-4Sbinding domains similar to those of bacterial- 
type ferredoxins. These proteins are listed below (references are only provided for recently determined sequences). - 

55 [0623] The iron-sulfur proteins of the succinate dehydrogenase and the fumarate reductase complexes (EC 1.3.99.1) . 
These enzyme complexes, which are components of the tricarboxylic acid cycle, each contain three subuntts: a flavo- 
protein, an iron-sulfur protein, and a t)-type cytochrome. The iron- sulfur proteins contain three different iron-sulfur 
centers: a 2Fe-2S, a 3Fe-3S and a 4Fe-4S. - Escherichia coli anaerobic glycerol-3-phosphate dehydrogenase (EC 
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1.1.99.5) This enzyme is composed of three subunits: A. B. and C. The C subunit seems to be an iron-sulfur protein 
•vith tvvo fcrrcdcxin-like domains in the N-termIn?.! p?"^ ^f *hfi protein. - Escherichia coli anaerobic dimethyl sulfoxide 
reductrseT'Th^^^ subunit of this enzyme (gene dmsB) is an iron-sulfur protein with four 4Fe^S ferredoxin-like domains. 
- Escherichia coli formate hydrogenlyase. Two of the subunits of this oligomeric complex (genes hycB and hycF) seem 

5 to be iron-sulfur proteins that each contain two 4Fe-4S ferredoxin-like domains. - Methanobacterium formicicum formate 
dehydrogenase (EC 1.2.1 .2 ). This enzyme is used by the archaebacteria to grow on fonmate. The beta chain of this 
dimeric enzyme probably binds two 4Fe-4S centers. - Escherichia coli formate dehydrogenases N and O (EC 1.2.1.2). 
The beta chain of these two enzymes (genes fdnH and fdoH) are iron-sulfur proteins with four 4Fe-4S ferredoxin-like 
domains. - Desulfovibrio periplasmic [Fe] hydrogenase (EC 1.18.99.1 ). The large chain of this dimeric enzyme binds 

10 three 4Fe-4S centers, two of which are located in the fen-edoxin-like N-temninal region of the protein. - Methanobac- 
terium thermoautrophicum methyl viologen-reducing hydrogenase subunit mvhB, which contains six tandemly repeated 
ferredoxin-like domains and which probably binds twelve 4Fe-4S centers. - Salmonella typhimurium anaerobic sulfite 
reductase (EC 1.8.1 .-) [4]. Two of the subunits of this enzyme (genes asrA and asrC) seem to both bind two 4Fe-4S 
centers. - A Fen-edoxin-like protein (gene fixX) from the nitrogen-fixation genes locus of various Rhizobium species, 

15 and one from the Nif-region of Azotobacter species. - The 9 Kd polypeptide of chloroplast photosystem I [5] (gene 
psaC). This protein contains two low potential 4Fe^S centers, refen-ed as the A and B centers. - The chloroplast frxB 
protein which is predicted to carry two 4Fe^S centers. - An fen-edoxin from a primitive eukaryote, the enteric amoeba 
Entamobea histolytica. - Escherichia coli hypothetical protein yjjW. a protein with a N-temninal region belonging to the 
radical activating enzymes family (see <PDOC00834 >) and two potential 4Fe-4S centers.The pattern of cysteine res- 

20 idues in the iron-sulfur region is sufficient todetect this class of 4Fe-4S binding proteins. 

[0624] Consensus pattern: C-x(2)-C-x(2)-C-x(3)-C-[PEG] [The four C's are 4Fe^S ligands]- 

[ 1] Meyer J. Trends Ecol. Evol. 3:222-226(1988). 
[2] Otaka E., Ooi T J. Mol. Evol. 26:257-267(1987). 
25 [ 3] Beinert H. FASEB J. 4:2483-2492(1990). 

[4] Huang C.J., Barrett E.L. J. Bacteriol. 173:1544-1553(1991). 
[ 5] Knaff D.B. Trends Biochem. Sci. 13:460-461(1988). 

[0625] 205. NifH/frxC family signatures (fer4_NifH) 

30 Nitrogenase (EC 1.18.6.1 ) [1] is the enzyme system responsible for biological nitrogen fixation, Nitrogenase is an 
oligomeric complex which consists of two components: component 1 which contains the active site for the reduction 
of nitrogen to ammonia and component 2 (also called the iron protein). Component 2 is a homodimer of a protein (gene 
nifH) which binds a single 4Fe^S iron sulfur cluster [2]. In the nitrogen fixation process nifH is first reduced by a protein 
such as ferredoxin; the reduced protein then transfers electrons to component 1 with the concomitant consumption of 

35 ATP.A number of proteins are known to be evolutionary related to nifH. These proteins are: - Chloroplast encoded frxC 
(or chIL) protein [3]. FrxC is encoded on the chloroplast genome of some plant species, its exact function is not known, 
but it could act as an electron earner in the conversion of protochlorophyllide to chlorophyllide. - Rhodobacter capsulatus 
proteins bchL and bchX [4]. These proteins are also likely to play a role in chlorophyll synthesis. There are a number 
of conserved regions in the sequence of these proteins: in the N-terminal section there is an ATP-binding site motif 'A* 

40 (P-loop) and in the central section there are two conserved cysteines which have been shown, in nifH. to be the ligands 
of the 4Fe-4S cluster. Two signatures patterns that correspond to the regions around these cysteines were developed. 
[0626] Consensus pattern: E-x-G-G-P-x(2)-[GA]-x-G-C-[AG]-G [C binds the iron-sulfur center]- 
Consensus pattern: D-x-L-G-D-V-V-C-G-G-F-[AG]-x-P [C binds the iron-sulfur center]- 

45 [ 1] Pau R.N. Trends Biochem. Sci. 14:183-186(1989). 

[ 2) Georgiadis M.M., Komiya H.. Chakrabarti P. Woo D.. Kornuc J.J.. Rees D.C. Science 257:1653-1659(1992). 
[ 3] Fujita Y., Takahashi Y., Kohchi T, Ozeki H., Ohyama K.. Matsubara H. Plant Mol. Biol. 13:551-561(1989). 
[4] Burke D.H., Alberti M.. Hearst J.E. J. Bacteriol. 175:2407-2413(1993). 

50 [0627] 206. Ferritin iron-binding regions signatures 

Ferritin [1.2] is one of the major non-heme iron storage proteins. It consists of a mineral core of hydrated feme oxide, 
and a muiti-subunit protein shell which englobes the former and assures its solubility in an aqueous environment In 
animals the protein is mainly cytoplasmic and there are generally two or more genes that encodes for closely related 
subunits (in mammals there are two subunits which are known as H(eavy) and L(ight)). In plants femtin is found in the 

55 chloroplast [3].There are a number of well conserved region in the sequence of fenitins. Two of these regions to develop 
signature patterns were selected. The first pattern is located in the central part of the sequence of femtin and it contains 
three conserved glutamate which are thought to be involved in the binding of iron. The second pattem is located in the 
C-temiinal section, it corresponds to a region which forms a hydrophilic channel through which small molecules and 
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ions can gain access to the central cavity of the molecule; this pattern also includes conserved acidic residues which 
are potential metat>binding sites. 

[0628] Consensus pattern: E-x-[KR]-E-x(2)-E-[KR]-[LF]-[LIVMA]-x{2)-Q-N-x-R-x-G-R [The 3 E's are potential iron 
ligands]- 

5 Consensus pattern: D-x(2)-[LIVMF]-[STAC]-(DHl-F-[LI]-[EN]-x(2)-[FY]-L-x(6)-(LIVM]-[KN] [The second D and the E are 
potential iron ligands]- 

[ 1] Crichton R.R.. Charloteaux-Wauters M. Eur. J. Biochem. 164:485-506(1987). 
[2] Theil E.G. Annu. Rev. Biochem. 56:289-315(1987). 
10 1 3] Ragiand M., Briat J.-F., Gagnon J., Laulhere J.-R. Massenet C, Theil E.C. J. Biol. Chem. 265:18339-18344 

(1990). 

[0629] 207. Intermediate filaments signature (filament) 

Intermediate filaments (IF) [1,2,3] are proteins which are primordial components of the cytoskeleton and the nuclear 
15 envelope. They generally fonm filamentous structures 8 to 1 4 nm wide. IF proteins are members of a very large multigene 
family of proteins which has been subdivided in five major subgroups: - Type I: Acidic cytokeratins. - Type II: Basic 
cytokeratins. - Type III: Vimentin. desmin, glial fibrillary acidic protein (GFAP), peripherin, and plasticin. - Type IV: 
Neurofilaments L. H and M, alpha-internexin and nestin. - Type V: Nuclear lamins A. B1. B2 and C. All IF proteins are 
structurally similar in that they consist of: a central rod domain comprising some 300 to 350 residues which is arranged 
20 in coiled-coiled alpha-helices, with at least two short characteristic interruptions; a N-terminal non-helical domain (head) 
of variable length; and a C-terminal domain (tail) which is also non-helical, and which shows extreme length variation 
between different IF proteins. While IF proteins are evolutionary and structurally related, they have limited sequence 
homologies except in several regions of the rod domain. A conserved region at the C-terminal extremity of the rod 
domain was used as a sequence pattern for this class of proteins. 
25 [0630] Consensus pattern: [IVJ-x-[TACI]-Y-[RKH]-x-[LM]-L-[DE]- 

[ 1] Quinlan R., Hutchison C. Lane B. Protein Prof. 2:801-952(1995). 
[ 2] Steiner P.M., Roop D.R. Annu. Rev. Biochem. 57:593-625(1988). 
[3] Stewart M. Cun^. Opin. Cell Biol. 2:91-100(1990). 

30 

[0631] 208, Flavodoxin signature 

Flavodoxins [1 , E1] are electron-transfer proteins that function in various electron transport systems. Flavodoxins bind 
one FMN molecule, which serves as a redox-active prosthetic group. Flavodoxins are functionally interchangeable with 
ferredoxins. They have been isolated from prokaryotes, cyanobacteria, and some eukaryotic algae. The signature 
35 pattern for these proteins is derived from a conserved region in their N-terminal section, this region is involved in the 
binding of the FMN phosphate group. 

[0632] Consensus pattern: [LIV]-[LIVFY]-(FY]-x-[ST]-x(2)-[AGC]-x-T-x(3)-A-x(2)-[LIV]- 
[ 1] Wakabayashi S., Kimura K., Matsubara H., Rogers L.J. Biochem. J. 263:981-984(1989). 
[0633] 209. Growth factor and cytokines receptors family signatures (fnS) 

40 A number of receptors for lymphokines, hematopoeitic growth factors and growth hormone-related molecules have 
been found [1 to 5] to share a common binding domain. Receptors known to belong to this family are: - Cytokine 
receptor common beta chain. This chain is common to the IL-3, IL-5 and GM-CSF receptors. - Cytokine receptor 
common gamma chain. This chain is common to the IL-2, IL-4, IL-7 and IL-13 receptors. - Ciliary neurotrophic factor 
receptor (CNTFR). - Erythropoietin receptor (EPOR). - Granulocyte colony-stimulating factor receptor (G-CSFR). - 

45 Granulocyte-macrophage colony-stimulating factor receptor alpha chain (GM- CSFR). - lnterieukin-2 receptor beta 
chain (IL2R-beta). - lnterieukin-3 receptor alpha chain (IL3R). - lnterieukin-4 receptor alpha chain (IL4R). - Interieukin- 
5 receptor alpha chain (IL5R). - lnterieukin-6 receptor (IL6R). - lnterieukin-7 receptor alpha chain (IL7R). - Interieukin- 
9 receptor (IL9R). - Growth hormone receptor (GRHR). - Prolactin receptor (PRLR). - Thrombopoeitin receptor (TPOR). 
The conserved region constitutes all or part of the extracellular ligand-binding region and Is about 200 amino acid 

50 residues long. In the N-terminal of this domain there are two pairs of cysteines known, in the growth hormone receptor, 
to be involved in disulfide bonds. + ^XXXXXXX + | C C C C Extracel- 
lular XXXXXXX Cytoplasmic |+-|-| 1-| XXXXXXX +|||| Transmembrane +- 

+ +-+ Two pattems to detect this family of receptors were used. The first one is derived from the first N-terminal disulfide 
loop, the second is a tryptophan-rich pattern located at the C-terminal extremity of the extracellular region. 

55 [0634] Consensus pattem: C-[LVFYR]-x(7.8)-[STIVDNl-C-x-W [The two C*s are linked by a disulfide bond]- 
Consensus pattem: [STGL]-x-W-[SG]-x-W-S- 

( 1] Bazan J.F. Biochem. Biophys. Res. Commun. 164:788-795(1989). 
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[2] Bazan J.F. Proc. Natl. Acad. Sci. U.S.A. 87:6934-6938(1990). 

r ^ I o r» i^^^..^r» EP I Ro/-urr>c»nn ^yl p Park I Roodwin R.G.. March C.J. Trends Biochem. 

Sci. 15:265-270(1990). 

[4] d'Andrea A.D.. Fasman G.D., Lodish H.F. Cell 58:1023-1024(1989). 
5 [ 5] d'Andrea A.D., Fasman G.D.. Lodish H.F. Curr. Opin. Cell Biol. 2:648-651(1990). 

[0635] 210. Phosphoribosylglycinamide formyltransferase active site (formyl_transO 

Phosphoribosylgiycinamide formyltransferase (EC 2.1.2.2 ) (GART) [1] catalyzes the third step in de novo purine bio- 
synthesis, the transfer of a formyl group to 5'-phosphoribosylglycinamide. In higher eukaryotes, GART is part of a 

10 multifunctional enzyme polypeptide that catalyzes three of the steps of purine biosynthesis. In bacteria, plants and 
yeast. GART is a monofunctional protein of about 200 amino-acid residues. In the Escherichia coli enzyme, an aspartic 
acid residue has been shown to be involved in the catalytic mechanism. The region around this active site residue is 
well conserved in GART from prokaryotic and eukaryotic sources and can be used as a signature pattern. Mammalian 
formyltetrahydrofolate dehydrogenase (EC 1.5.1.6 ) [2] is a cytosolicenzyme responsible for the NADP-dependent de- 

15 carboxylative reduction of 1 0-formyltetrahydrofolate into tetrahydrofolate. It is a protein of about 900 amino acids con- 
sisting of three domains; the N-terminal domain (200 residues) is structurally related to GARTs.Escherichia coli me- 
thionyl-tRNA fonnyltransferase (EC 2.1.2.9 ) (gene fmt) [3]is the enzyme responsible for modifying the free amino group 
of the aminoacylmoiety of methionyl-A{fMet). The central part of fmt seems to be evolutionary related to GART's active 
site region. 

20 [0636] Consensus pattern: G-x-[STM]-[lVT]-x-[FYWVQ]-[VMAT]-x-[DEVM]-x-[LIVMY]-D-x-G- x(2)-[UVT]-x(6)- 
[LIVM] [D is the active site residue] - 

[ 1] Inglese J., Smith J.M., Benkovic S.J. Biochemistry 29:6678-6687(1990). 
[ 2] Cook R.J.. Lloyd R.S., Wagner C. J. Biol. Chem. 266:4965-4973(1991). 
25 [ 3] Guillon J.-M., Mechulam Y, Schmitter J.-M., Blanquet S., Fayat G. J. Bacteriol. 174:4294-4301(1992). 

[0637] 2 11 . G 1 0 protein signatures 

A Xenopus protein known as G10 [1] has been found to be highly conserved In a wide range of eukaryotic species. 
The function of G10 is still unknown. G10 is a protein of about 17 to 18 Kd (143 to 157 residues) which is hydrophilic 
30 and whose C-terminal half is rich in cysteines and could be involved in metal-binding. As signature pattems. two of 

these cysteine-rich segments were selected. 

[0638] Consensus pattern: L-C-C-x-[KR]-C'X(4)-[DE]-x-N-x(4)-C-x-C-R-V-P- 
Consensus pattern: C-x-H-C-G-C-[KRH]-G-C-[SA]- 

[0639] [ 1] McGrew L.L., Dworkin-RastI E.. Dworkin M.B., Richter J.D. Genes Dev. 3:803-815(1989). 
35 [0640] 212. G-protein alpha subunit 

[0641] G proteins couple receptors of extracellular signals to intracellular signaling pathways. The G protein alpha 
subunit binds guanyl nucleotide and is a weak GTPase. Number of members: 195 

[1] Coleman DE. Berghuis AM, Lee E. Linder ME, Gilman AG, Sprang SR, Science 1994;265:1405-1412. 
40 [2] How G proteins work: a continuing story. Coleman DE, Sprang SR, Trends Biochem Sci 1996;21:41-44. 

[0642] 213. Glucose-6-phosphate dehydrogenase active site (G6PD) 

Glucose-6-phosphate dehydrogenase (EC 1.1.1.49) (G6PD) [1] catalyzes the first step in the pentose pathway, the 
reduction of glucose-6-phosphate to gluconolactone 6-phosphate. A lysine residue has been identified as are active 
45 nucleophile associated with the activity of the enzyme. The sequence around this lysine is totally conserved from 
bacterial to mammalian G6PD's and can be used as a signature pattern 
[0643] Consensus pattern: D-H-Y-L-G-K-[EQK] [K is the active site residue]- 

[0644] [ 1] Jeffery J., Persson B., Wood I., Bergman T, Jeffery R.. Joernvall H. Eur. J. Biochem. 212:41-49(1993). 
[0645] 214. GATA-type zinc finger domain 
50 The GATA family of transcription factors are proteins that bind to DNA sites with the consensus sequence (A/r)GATA 
(A/G), found within the regulatory region of a number of genes. Proteins currently known to belong to this family are: 

- GATA-1 [1] (also known as Eryf1, GF-1 or NF-E1). which binds to the GATA region of globin genes and other genes 
expressed in erythroid cells. It is a transcriptional activator which probably serves as a general 'switch* factor for eryth- 
roid development. - GATA-2 [2], a transcriptional activator which regulates endothelin-1 gene expression in endothelial 

55 cells. - GATA-3 [3], a transcriptional activator which binds to the enhancer of the T-cell receptor alpha and delta genes. 

- GATA-4 [4], a transcriptional activator expressed In endodemrially derived tissues and heart. - Drosophlla protein 
pannier (or DGATAa) (gene pnr) which acts as a repressor of the achaete-scute complex (as-c). - Bombyx mori BCFI 
[5], which regulates the expression of chorion genes. - Caenorhabditis elegans elt-1 and elt-2. transcriptional activators 
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of genes containing the GATA region, including vitellogenin genes [61. - Ustilago maydis urbsl [7], a protein involved 
in the repression of the biosynthesis of siderophores. • Fission yeast protein GAF2. All these transcription factors contain 
a pair of highly similar 'zinc finger* type domains with the consensus sequence C-x2-C-x1 7-C-x2-C.Some other proteins 
contain a single zinc finger motif highly related to those of the GATA transcription factors. These proteins are: - Dro- 

5 sophila box A-binding factor (ABF) (also known as protein serpent (gene srp)) which may function as a transcriptional 
activator protein and may play a key role in the organogenesis of the fat body. > Emericella nidulans areA [8], a tran- 
scriptional activator which mediates nitrogen metabolite repression. - Neurospora crassa nit-2 [9], a transcriptional 
activator which tums on the expression of genes coding for enzymes required for the use of a variety of secondary 
nitrogen sources, during conditions of nitrogen limitation. • Neurospora crassa white collar proteins 1 and 2 (WC-1 and 

10 WC-2), which control expression of light-regulated genes. - Saccharomyces cerevisiae DAL81 (or UGA43), a negative 
nitrogen regulatory protein. - Saccharomyces cerevisiae GLN3, a positive nitrogen regulatory protein. - Saccharomyces 
cerevisiae GAT1. - Saccharomyces cerevisiae GZF3. 

[0646] Consensus pattern: C-x-[DN]-C-x(4,5HST|-x(2)-W-[HRl-[RKl-x(3)-IGN].x(3,4)- C-N-[AS]-C [The four C's are 

zinc ligands] 

15 

1 1] Trainor CD., Evans T, Felsenfeld G., Boguski M.S. Nature 343:92-96(1990). 
[ 2] Lee M.E., Temizer D.T, Clifford J.A.. Quertemnous T. J. Biol. Chem. 266:16188-16192(1991). 
[ 3] Ho l.-C. Vorhees P., Marin N.. Oakley B.K., Tsai S.-F.. Orkin S.H.. Leiden J.M. EMBO J. 10:1187-1192(1991). 
[4] Spieth J.. Shim Y.H.. Lea K., Conrad R.. Blumenthal T Mol. Cell. Biol. 11:4651-4659(1991). 
20 [ 5] Drevet J.R., Skeiky Y.A.. latrou K. J. Biol. Chem. 269:10660-10667(1994). 

[ 6] Hawkins M.G., McGhee J.D, J. Biol. Chem. 270:14666-14671(1995). 

[ 7] Voisard C.P.O., Wang J., Xu P., Leong S.A.. McEvoy J.L. Mol. Cell. Biol. 13:7091-7100(1993). 

[ 8] Arst H.N. Jr., Kudia B., Martinez-Rossi N.M., Caddick M.X., Sibley S., Davies R.W. Trends Genet 5:291-291 

(1989). 

25 [ 9] Fu Y.-H.. Marzluf G.A. Mol. Cell. Biol. 10:1056-1065(1990). 

[0647] 215. Glutamine amidotransferases class-l active site (GATase) 

A large group of biosynthetic enzymes are able to catalyze the removal of the ammonia group from glutamine and then 
to transfer this group to a substrate to form a new carbon-nitrogen group. This catalytic activity is known asglutamine 

30 amidotransferase (GATase) (EC 2.4.2.-) [1]. The GATase domain exists either as a separate polypeptidic subunit or 
as part of a larger polypeptide fused in different ways to a synthase domain. On the basis of sequence similarities two 
classes of GATase domains have been identified [2,3]: class-l(also known as trpG-type) and class-ll (also known as 
purF-type). Class-l GATase domains have been found in the following enzymes: - The second component of anthra- 
nilate synthase (AS) (EC 4.1.3.27) [4]. AS catalyzes the biosynthesis of anthranilate from chorismate and glutamine. 

35 AS is generally a dimeric enzyme: the first component can synthesize anthranilate using ammonia rather than 
glutamine, whereas component II provides the GATase activity. In some bacteria and in fungi the GATase component 
of AS is part of a multifunctional protein that also catalyzes other steps of the biosynthesis of tryptophan. - The second 
component of 4-amino-4-deoxychorismate (ADC) synthase (EC 4.1.3. -), a dimeric prokaryotic enzyme that function 
in the pathway that catalyzes the biosynthesis of para-am inobenzoate (PABA) from chorismate and glutamine. The 

40 second component (gene pabA) provides the GATase activity [4]. - CTP synthase (EC 6.3.4.2 ). CTP synthase catalyzes 
the final reaction in the biosynthesis of pyrimidine. the ATP-dependent formation of CTP from UTP and glutamine. CTP 
synthase is a single chain enzyme that contains two distinct domains; the GATase domain is in the C-terminal section 
[2]. - GMP synthase (glutamine-hydrolyzing) (EC 6.3.5.2) . GMP synthase catalyzes the ATP-dependent formation of 
GMP from xanthosine 5*-phosphate and glutamine. GMP synthase is a single chain enzyme that contains two distinct 

45 domains; the GATase domain is in the N-terminal section [5]. - Glutamine-dependent carbamoyl-phosphate synthase 
(EC 6.3.5.5 ) (GD-CPSase); an enzyme involved in both arginine and pyrimidine biosynthesis and which catalyzes the 
ATP-dependent formation of carbamoyl phosphate from glutamine and carbon dioxide. In bacteria GD-CPSase is com- 
posed of two subunits: the targe chain (gene carB) provides the CPSase activity, while the small chain (gene carA) 
provides the GATase activity. In yeast the enzyme involved in arginine biosynthesis is also composed of two subunits: 

50 CPA 1 (GATase), and CPA2 (CPSase). In most eukaryotes* the first three steps of pyrimidine biosynthesis are catalyzed 
by a large multifunctional enzyme (called URA2 in yeast, rudimentary in Drosophila, and CAD in mammals). The GA- 
Tase domain is located at the N-terminal extremity of this polyprotein [6). - 

Phosphoribosylformylglycinamidine synthase II (EC 6.3.5.3 ). an enzyme that catalyzes the fourth step in the de novo 
biosynthesis of purines. In some species of bacteria, FGAM synthase II is composed of two subunits: a small chain 
55 (gene purQ) which provides the GATase activity and a large chain (gene purL) which provides the aminator activity. - 
The histidine amidotransferase hisH. an enzyme that catalyzes the fifth step in the biosynthesis of histidine in prokary- 
otes.ln the second component of AS a cysteine has been shown [7] to be essentialfor the amidotransferase activity. 
The sequence around this residue is well conserved in all the above GATase domains and can t>e used as a signature 
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pattern for class-l GATase.- u/K-r-Ai rr- 

^ roACi-ri i\/K>icvn.n iVMFYl-n-riJVMFYl-C-rUVMFYNl-G-x-fQEHl- x-[LIVMFA] [C is the 

|W*OJ WWI lOMO |^M>.t.^l>i. I.1 » %wj • J L— J - - 

active site residue]- 

[ 1] Buchanan J.M. Adv. Enzymol. 39:91-183(1973). 

[2] Weng M., Zakin H. J. Bacteriol. 169:3023-3028(1987). 

[ 3] Nyunoya H.. Lusty C.J. J. Biol. Chem. 259:9790-9798(1984). 

[4] Crawford l.P. Annu. Rev. Microbiol. 43:567-600(1989). 

[ 5] Zaikin H Argos P., Narayana S.V.L., Tiedeman A.A., Smith J.M. J. Biol. Chem. 260:3350-3354(1985). 
[6] Davidson J.N.. Chen K.C.. Jamison R.S.. Musmanno L.A.. Kern C.B. BioEssays 15:157-164(1993). 
[7] Tso J.Y.. Hermodson M.A., Zaikin H. J. Biol. Chem. 255:1451-1457(1980). 

[0649] 216. Glutamine amidotransferases class-ll active site (GATase_2) 

A large group of biosynthetic enzymes are able to catalyze the removal of the ammonia group from glutamine and then 
to transfer this group to a substrate to form a new carbon-nitrogen group. This catalytic activity is known as glutamine 
amidotransferase (GATase) (EC 2.4.2.-) [1]. The GATase domain exists either as a separate polypeptidic subunit or 
as part of a larger polypeptide fused in different ways to a synthase domain. On the basis of sequence similarities two 
classes of GATase domains have been identified [2,3]: class-l(also known as trpG-type) and class-ll (also known as 
purF-type) Class-ll GATase domains have been found in the following enzymes: - Amido phosphoribosyltransferase 
(glutamine phosphoribosylpyrophosphate amidotransferase) (EC 2.4.2.14 ). An enzyme which catalyzes the first step 
in purine biosynthesis, the transfer of the ammonia group of glutamine to PRPP to fomn S-phosphoribosylamine (gene 
purF in bacteria ADE4 in yeast). - Glucosamine-fructose-6-phosphate aminotransferase (EC 2.6.1.16). This enzyme 
catalyzes a key reaction in amino sugar synthesis, the formation of glucosamine 6-phosphate from fructose 6-phos- 
phate and glutamine (gene gImS in Escherichia coli, nodM in Rhizobium. GFA1 in yeast) - Asparagine synthetase 
(glutamine-hydrolyzing) (EC 6.3.5.4 ). This enzyme is responsible for the synthesis of asparagine from aspartate and 
glutamine. A cysteine is present at the N-terminal extremity of the mature form of all these enzymes. The cysteine has 
been shown, in amido phosphoribosyltransferase [4] and in asparagine synthetase [5] to be important for the catalytic 

mechanism. , 
[0650] Consensus partem: <x(0.11)-C-[GS]-[IVl-[LIVMFYW]-IAG] [C is the active site residue]- 

[ 1] Buchanan J.M. Adv. Enzymol. 39:91-183(1973). 

[ 2] Weng M., Zaikin H. J. Bacteriol. 169:3023-3028(1987). 

[ 3] Nyunoya H., Lusty C.J. J. Biol. Chem. 259:9790-9798(1984). 

[ 4] van Heeke G.. Schuster M. J. Biol. Chem. 264:5503-5509(1989). 

[ 5] Vollmer S.J.. Switzer R.L.. Hermodson M.A., Bower S.G.. Zaikin H. J. Biol. Chem. 258:10582-10585(1983). 
[0651] 217. GDP dissociation inhibitor (GDI) 

[0652] [1] Schalk I, Zeng K. Wu SK. Stura EA. Matteson J. Huang M, Tandon A. Wilson lA, Balch WE. Nature 1 996. 
381:42-48. 

[0653] 218. Oxidoreductase family (GFO_IDH_MocA) 

[0654] This family of enzymes utilise NADP or NAD. This family: is called the GFO/IDH/MOCA family in swiss-prot. 
[0655] [1] Kingston RL, Scopes RK, Baker EN, Stmcture 1996;4:1413-1428. 
[0656] 219. GHMP kinases putative ATP-binding domain 

The following kinases contains, in their N-tenminal section, a conserved Gly/Ser-rich region which is probably involved 
in the binding of ATP [1]. These kinases are listed below. - Galactokinase (EC 2.7.1.6). - Homosenne kinase (EC 
2.7.1.39 ). - Mevalonate kinase (EC 2.7.1.36) . - Phosphomevalonate kinase (EC 2.7.4.2). This group of kinases was 
called 'GHMP' (from the first letter of their substrate) 

Consensus pattern: [LIVM]-[PK]-x-[GSTA]-x(0.1 )-G-L-[GSl-S-S-[GSA]-[GSTAC]- 
[0657] [ 1] Tsay Y.H.. Robinson G.W. Mol. Cell. Biol. 11:620-631(1991). 
[0658] 220. Glucose inhibited division protein A family signatures (GIDA) 

Bacterial glucose inhibited division protein A (gene gidA) is a protein of 70Kd whose function is not yet known and 
whose sequence is highly conserved. It is evolutionary related to yeast hypothetical protein YGL236C, Caenorhabdi is 
elegans hypothetical protein F52H3.2 and a Bacillus subtilis protein called gid (and which is different from B.subtilis 
gidA). Two highly conserved regions were selected as signature pattems. Both regions are located in the central region 
of the protein. 

[0659] Consensus pattern: [GS]-[PT]-x-Y-C-P-S-[LIVMl-E-x-K-[LIVM]-x-[KR]- 

Consensus pattern: A-G-Q-x-[NTl-G-x(2)-G-Y-x-E-[SAG](3)-[QS]-G-[LIVM](2)-A-G-[LIVMT]-N-A- 

[0660] 221 . (GLFV^dehydrog) 
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Glu / Leu / Phe / Val dehydrogenases active site 

Glutamate dehydrogenases (EC 1.4.1.2. EC 1.4.1.3, and EC 1.4.1.4) (GluDH) are enzymes that catalyze the NAD- 
or NADP-dependent reversible deamination of glutamate into alpha-ketoglutarate [1,2]. GluDH isozymes are gen- 
5 eralty involved with either ammonia assimilation or glutamate catabolism. 

Leucine dehydrogenase (EC 1.4.1.9) (LeuDH) is a NAD-dependent enzyme that catalyzes the reversible deami- 
nation of leucine and several other aliphatic amino acids to their keto analogues [3]. 

Phenylalanine dehydrogenase (EC 1 .4.1.20) (PheDH) is a NAD-dependent enzyme that catalyzes the reversible 
deamidation of L-phenylalanine into phenytpyruvate [4]. 
10 - Valine dehydrogenase (EC 1.4.1.8) (VaIDH) is a NADP-dependent enzyme that catalyzes the reversible deami- 
dation of L-valine Into 3-methyl-2-oxobutanoate [5]. 

[0661] These dehydrogenases are structurally and functionally related. A conserved lysine residue located in a gly- 
cine-rich region has been implicated in the catalytic mechanism. The conservation of the region around this residue 
15 allows the derivation of a signature pattern for such type of enzymes. 

[0662] Consensus pattern[LIV]-x(2)-G-G-[SAG]-K-x-(GV]-x(3)-[DNST|-[PL] [K is the active site residue] Sequences 
known to belong to this class detected by the pattern ALL. 

[0663] Note all known sequences from this family have Pro in the last position of the pattern with the exception of 
yeast GluDH which as Leu. 

20 

1 1) Britton K.L.. Baker RJ., Rice D.W., Stillman TJ. Eur. J. Biochem. 209:851-859(1992). 
[ 2] Benachenhou-Lahfa N., Forten-e P., Labedan B. J. Mol. Evol. 36:335-346(1993). 

[ 3] Nagata S., Tanizawa K., Esaki N., Sakamoto Y, Ohshima T., Tanaka H., Soda K. Biochemistry 27:9056-9062 
(1988). 

25 [ 4] Takada H., Yoshimura T, Ohshima T.. Esaki N.. Soda K. J. Biochem. 109:371-376(1991). 

[ 5] Hutchinson C.R.. Tang L. J. Bacteriol. 175:4176-4185(1993). 

[0664] 222. GMC oxidoreductases signatures 

The following FAD flavoproteins oxidoreductases have been found [1 ,2] to be evolutionary related. These enzymes, 

30 which are called 'GMC oxidoreductases*, are listed below. - Glucose oxidase (EC 1.1.3.4) (GOX) from Aspergillus niger. 
Reaction catalyzed: glucose + oxygen -> delta-gluconolactone + hydrogen peroxide. - Methanol oxidase (EC 1.1.3.13) 
(MOX) from fungi. Reaction catalyzed: methanol + oxygen -> acetaldehyde -•- hydrogen peroxide. - Choline dehydro- 
genase (EC 1.1.99.1) (CHD) from bacteria. Reaction catalyzed: choline unknown acceptor -> betaine acetaldehyde 
+ reduced acceptor. - Glucose dehydrogenase (GLD) (EC 1.1.99.10) fi'om Drosophila. Reaction catalyzed: glucose + 

35 unknown acceptor -> delta-gluconolactone + reduced acceptor. - Cholesterol oxidase (CHOD) (EC 1.1.3.6 ) from Brevi- 
bacterium sterolicum and Streptomyces strain SA-COO. Reaction catalyzed: cholesterol + oxygen -> cholest-4-en- 
3-one + hydrogen peroxide. - AlkJ [3], an alcohol dehydrogenase from Pseudomonas oleovorans, which converts 
aliphatic medium-chain-length alcohols into aldehydes. This family also includes a lyase: - (R)-mandelonitrile lyase 
(EC 4.1.2.10) ( hydroxy nitrile lyase) from plants [4], an enzyme involved in cyanogenis, the release of hydrogen cyanide 

^0 from injured tissues. These enzymes are proteins of size ranging from 556 (CHD) to 664 (MOX) amino acid residues 
which share a number of regions of sequence similarities. One of these regions, located in the N-terminal section, 
corresponds to the FAD ADP-binding domain. The function of the other conserved domains is not yet known; two of 
these domains were selected as signature pattems. The first one is located in the N4erminal section of these enzymes, 
about 50 residues after the ADP-binding domain, while the second one is located in the central section. 

45 [0665] Consensus pattern: [GA]-[RKN]-x-[LIV]-G(2)-[GSTj(2)-x-[LIVM]-N-x(3)-[FYWA]- x(2)-[PAG]-x(5)-[DNESH]- 
Consensus pattern: [GS]-(PSTA]-x(2)-[ST|-P-x-[LIVM](2)-x(2)-S-G-[LIVMl-G- 

[ 1] Cavener D.R. J. Mol. Biol. 223:811-814(1992). 
[2] Henikoff S., Henikoff J.G. Genomics 19:97-107(1994). 
50 1 3] van Beilen J.B.. Eggink G., Enequist H.. Bos R., Witholt B. Mol. Microbiol. 6:3121-3136(1992). 

[4] Cheng LP. Poulton J.E. Plant Cell Physiol. 34:1139-1143(1993). 

[0666] 223. (GMP_synt_C) 
Glutamine amidotransferases class-l active site 
55 [0667] A large group of biosynthetic enzymes are able to catalyze the removal of the ammonia group from glutamine 
and then to transfer this group to a substrate to form a new carbon-nitrogen group. This catalytic activity is known as 
glutamine amidotransferase (GATase) (EC 2.4.2.-) [1]. The GATase domain exists either as a separate polypeptidic 
subunit or as part of a larger polypeptide fused in different ways to a synthase domain. On the basis of sequence 
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similarities two classes of GATase domains have been identified [2,3]: class-l (also known as trpG-type) and class-ll 
/-..-^ • « «^ r,..rf:_4,,r»o\ nioGci rjATasft finmains have been found in the following enzymes: 

. The second component of anthranilate synthase (AS) (EC 4.1 .3.27) AS catalyzes the biosynthesis of anthra- 
nilate from chorismate and glutamine. AS is generally a dimeric enzyme: the first component can synthesize an- 
thranilate using ammonia rather than glutamine, whereas component II provides the GATase activity. In sorne 
bacteria and in fungi the GATase component of AS is part of a multifunctional protein that also catalyzes other 
stepsof the biosynthesis of tryptophan. . ^. . , * ^«,w«,^ 

. The second componentof 4-amino-l-deoxychorismate (ADC) synthase (EC 4.1 .3. -). a dimenc prokaryotic enzyme 
that function in the pathway that catalyzes the biosynthesis of para^minobenzoate (PABA) from chonsmate and 
glutamine. The second component (gene pabA) provides the GATase activity [41. ,ko atp 

- CTP synthase (EC 6.3.4.2). CTP synthase catalyzes the final reaction in the biosynthesis of pynmidine. the ATP- 
dependent formation of CTP from UTP and glutamine. CTP synthase is a single chain enzyme that contains two 
distinct domains; the GATase domain is in the C-terminal section [21. 

. GMP synthase (glutamine-hydrolyzing) (EC 6.3.5.2). GMP synthase catalyzes the ^TP-dependent fon^abon of 
GMP from xanthosine 5'-phosphate and glutamine. GMP synthase is a single chain enzyme that contains two 
distinct domains; the GATase domain is in the N-tenninal section [5]. 

- Glutamine-dependent carbamoyl-phosphate synthase (EC 6.3.5.5) (GD-CPSase); an ^^V"^^ J"^°';'^J ^^^^^ 
arginine and pynmidine biosynthesis and which catalyzes the ATP-dependent fomiation of carbamoyl Phosphate 
from glutamine and carbon dioxide. In bacteria GD-CPSase is composed of two subunits: the arge chain (gene 
carB) provides the CPSase activity, while the small chain (gene carA) provides the GATase ^^-^^^ y^^^' 
enzyme involved in arginine biosynthesis is also composed of two subunits: CPA1 (GATase). and CPA2 (CPSa^)^ 
In most eukaryotes. the first three steps of pynmidine biosynthesis are catalyzed by a large multifunchonalenzyme 
(called URA2 in yeast, rudimentary in Drosophila. and CAD in mammals). The GATase domain is located at the 

N-terminal extremity of this polyprotein [6]. ^ . . , .^..^u c^r, m tho riP 

. Phosphoribosylfbnnylglycinamidine synthase II (EC 6.3.5.3). an enzyme that catalyzes the fourth ^tep -n the de 
novo biosynthesis of purines. In some species of bacteria. FGAM synthase II is composed of two subunits. a small 
chain (gene purQ) which provides the GATase activity and a large chain (gene purL) which provides the aminator 

- Srlistidine amidotransferase hisH. an enzyme that catalyzes the fifth step in the biosynthesis of histidine in 
prokaryotes. 

[0668] In the second component of AS a cysteine has been shown [71 to be essential for the arnidotransferase activity^ 
The sequence around this residue is well conserved in all the above GATase domains and can be used as a signature 

fo?S'*^o'ns?lTJS^^^^ ''t'^J^'^'^l 
Live site residue] Sequences known to belong to this class detected by the pattern except 
[0670] Note: in the first position of the pattern Pro is found in all cases except in the slime mold GD-CPSase where 
it is replaced by Ala. 

[ 1] Buchanan J.M. Adv. Enzymol. 39:91-183(1973). 

[2] Weng M., Zaikin H. J. Bacteriol. 169:3023-3028(1987). 

[ 3] Nyunoya H.. Lusty C.J. J. Biol. Chem. 259:9790-9798(1984). 

[ 4) Crawford I.R Annu. Rev. Microbiol. 43:567-600(1989). ,0=4/100=. 
5 Zaikin H.. Argos P.. Narayana S.V.L.. Tledeman A.A.. Smith J.M. J. Biol. Chem. 2fO:3^^35^^9S5). 
1 6] Davidson J.N., Chen K.C.. Jamison R.S.. Musmanno L.A.. Kern C.B. BioEssays 15:157-164(1993). 
[ 7] Tso J.Y.. Hemiodson MA.. Zaikin H. J. Btol: Chem. 255:1451-1457(1980). 

[0671] 224. Glutathione peroxidases signatures (GSHPx) 

Glutathione peroxidase (EC 1.11.1.9) (GSHPx) [1.2) is an enzyme that catalyzes the reduction oj ^y^"''^^^^^^ 

by glutathione. Its main funciS^H^protect against the damaging effect of endogenously fonned hydroxyperox.des. 

In higher vertebrates at least four forms of GSHPx are known to exist: a ubiquitous cytosolic form (GSHPx-1 ), a gas- 

troinLtinal cytosolic for (GSHPx-GI) [3]. a plasma secreted form (GSHPx-P) (41. and a epid.dymal secretoj fom^ 

(GSHPx-EP) Vn addition to these characterized forms, the sequence of a protein of unknown function [5] has been 

shown to be evolutionary related to those of GSHPx^s. In filarial nematode parasites such as Brugia pahangi the major 

soluble cuticular protein, known as gp29. is a secreted GSHPx which could provide a '"f«;«"'f" 'f/^'??'"^ 
immune reactionofthe mammalian host by neutralizing the products of theoxidative burst of leukocyte^^^^ 

coll protein btuE. a periplasmic protein involved in the transport of vitamin B12. is also evolutionary related to GSHPx s. 
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the significance of this relationship is not yet clear. Selenium, in the form of selenocysteine [7] is part of the catalytic 
site of GSHPx. The sequence around the selenocysteine residue is moderately well conserved in GSHPx's and the 
related proteins and can be used as a signature pattern. As a second signature for this family of proteins a highly 
conserved octapeptide located in the central section of these proteins was selected. 
5 [0672] Consensus pattern: [GN]-[RKHNFYC]-x-[LIVMFCl-[UVMF](2)-x-N-[VTl-x-[STCl-x-C-[GA]-x-T [C is the active 
site selenocysteine residue) 

Consensus pattem: [LIV]-[AGD)-F-P-[CS]-[NG)-Q- 

[ 1] Mannervik B. Meth. Enzymol. 113:490-495(1985), 
10 1 2] Mullenbach G.T., Tabrizi A., Irvine B.D.. Bell G.I., Tainer J.A.. Hallewell R.A. Protein Eng. 2:239-246(1988). 

[ 3] Chu F.F.. Doroshow J.H., Esworthy R.S. J. Biol. Chem. 268:2571-2576(1993). 

[4] Takahashi K., Akasaka M., Yamamoto Y., Kobayashi C. Mizoguchi J., Koyama J. J. Biochem. 108:145-148 

(1990). 

[ 5] Dunn D.K., Howells D.D., Richardson J.. Goldfarb R.S. Nucleic Acids Res. 17:6390-6390(1989). 
15 [ 6] Cookson E., Blaxter M.L., Selkirk M.E. Proc. Natl. Acad. Sci. U.S.A. 89:5837-5841(1992). 

[7] Stadtman T.C. Annu. Rev. Biochem. 59:111-127(1990). 

[0673] 225. (GST) 
Glutathione S-transferases 

20 [0674] Function: conjugation of reduced glutathione to a variety of targets. Also included in the alignment, but are 
not GSTs S-crystallins from squid. Similarity to GST was previously noted. Eukaryotic elongation factors 1 -gamma. 
Not known to have GST activity; similarity not previously recognized. Supported by HMM and manual alignment in- 
spection. HSP26 family of stress-related proteins, including auxin-regulated proteins in plants and stringent starvation 
proteins in E. coli. Not known to have GST activity. Similarity not previously recognized. Supported by HMM and manual 

25 alignment inspection. Alignment spans entire protein. 
[0675] 226. GTP1/0BG family signature 

A widespread family of GTP-binding proteins has been recently characterized [1 ,2]. This family currently includes: - 
Mouse and Xenopus protein DRG. - Human protein DRG2. - Drosophila protein 128up. - Fission yeast protein gtpl. - 
A Halobacterium cutirubrum hypothetical protein in a ribosomal protein gene cluster. - Bacillus subtilis protein obg. 

30 Obg has been experimentally shown to bind GTP - Escherichia coli hypothetical protein yhbZ. - Haemophilus influenzae 
hypothetical protein HI0877. - Mycoplasma genitalium hypothetical protein MG384. - Yeast hypothetical protein 
YAL036C (FUN11). - Yeast hypothetical protein YGR173w. - Caenorhabditis elegans hypothetical protein C02F5.3.The 
function of the proteins that belong to this family is not yet known. They are polypeptides of about 40 to 48 Kd which 
contain the five small sequence elements characteristic of GTP-binding proteins [3]. As a signature pattem the region 

35 that correspond to the ATP/GTP B motif (also called G-3 inGTP-binding proteins) was selected. 
[0676] Consensus pattem: D-[LIVMJ-P-G-[LIVM](2)-[DEY]-[GN]-A-x(2)-G-x-G - 

[ 1] Sazuka T, Tomooka Y, Ikawa Y. Noda M., Kumar S. Biochem. Biophys. Res. Commun. 189:363-370(1992). 
[ 2] Hudson J.D., Young P.G. Gene 125:191-193(1993). 
40 1 3] Bourne H.R., Sanders D.A.. McCormick F. Nature 349:117-127(1991). 

[0677] 227. (GTP_EFTU 1 ) 
ATP/GTP-binding site motif A (P-loop) 

[0678] From sequence comparisons and crystallographic data analysis it has been shown [1.2.3.4,5,6] that an ap- 
45 preciable proportion of proteins that bind ATP or GTP share a number of more or less conserved sequence motifs. 
The best conserved of these motifs is a glycine-rich region, which typically forms a flexible loop between a beta-strand 
and an alpha-helix. This loop interacts with one of the phosphate groups of the nucleotide. This sequence motif is 
generally refen-ed to as the 'A* consensus sequence [1] or the 'P-loop' [5] There are numerous ATP- or GTP-binding 
proteins in which the P-loop is found. Listed below are a number of protein families for which the relevance of the 
50 presence of such motif has been noted: - ATP synthase alpha and beta subunits (see <PDOC00137 >). - Myosin heavy 
chains. - Kinesin heavy chains and kinesin-like proteins (see <PDOC00343 >). - Dynamins and dynamin-like proteins 
(see < PDOC00362 >). - Guanylate kinase (see < PDOC00670 >). - Thymidine kinase (see < PDOC00524 >). - Thymi- 
dylate kinase (see < PDOC01034 >). - Shikimate kinase (see < PDOC00868 >). - Nitrogenase iron protein family (nifH/ 
frxC) (see < PDOC00580 >). - ATP-binding proteins involved in 'active transport' (ABC transporters) [7] (see 
55 < PDOC00185 >). - DNA and RNA helicases [8,9,10). - GTP-binding elongation factors (EF-Tu. EF-1alpha, EF-G, EF- 
2, etc.). - Ras family of GTP-binding proteins (Ras, Rho, Rab, Ral. Ypti, SEC4, etc.). - Nuclear protein ran (see 
< PDOC00859 >). - ADP-ribosylation factors family (see <PDOC00781 >). - Bacterial dnaA protein (see <PDOC00771 >). 
- Bacterial recA protein (see < PDOC00131 >). - Bacterial recF protein (see < PDOC00539 >). - Guanine nucleotide- 
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binding proteins alpha subunlts (Gi. Gs. Gt. GO. etc.). - DNA mismatch repair proteins m"tS fa^^i'V (See 
"?.nn«L^ =,^o,ioi t«~. II ««,r«tinn svstem DFOtein E (see <PDOC00567>).Not all ATP- or GTP-binding pro- 
te'in7are"F^^"e d-up7;\htemoti^."Anumber of' proteins escape detection because the structure of t^^e*^ f 
Se is completely different from that of the P-loop. Examples of such proteins are the E 1 -E2 ATPases or the gJrcoly^K: 
kinases In other ATP- or GTP-binding proteins the flexible loop exists in a slightly drfferent form; this is the case for 
tubulinsor protein kinases-Aspecial mention must be reserved for adenylate kinase, in which there .sasingle deviation 

from the P-loop pattern: in the last position Gly is found instead of Ser or Thr. 
- Consensus pattern: [AG]-x(4)-G-K-[STl- 

[ 1] Walker J.E.. Saraste M., Runswick M.J.. Gay N.J. EMBO J. 1:945-951(1982). 
( 2] Moller W., Amons R. FEBS Lett. 186:1-7(1985). 

r 31 Frv D C Kuby S.A., Mildvan A.S. Proc. Natl. Acad. Sci. U.S.A. 83:907-911(1986). 
4] Dever TE.. Glynias M.J.. Merrick W.C. Proc. Natl. Acad. Sci. U.S.A. 84:1814-1818(1987). 
[ 5] Saraste M., Sibbald PR., Wittinghofer A. Trends Biochem. Sci. 15:430-434(1990). 

[ 61 Koonin E.V. J. Mol. Biol. 229:1165-1174(1993). Kr o-j- 

[7] Higgins C.F.. Hyde S.C., Mimmack M.M., Gileadi U.. Gill D.R., Gallagher M.P. J. Bioenerg. BK.membr. 22. 

571-592(1990). 

[ 81 Hodgman T.C. Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata). 

[ 9] LinderR. Lasko R, AshbumerM.. Leroy R. Nielsen RJ., Nishi K., Schnier J.. Slonimski RR Nature 337.121-122 

llof Gorbalenya A.E., Koonin E.V.. Donchenko A.R. Blinov V.M. Nucleic Acids Res. 17:4713-4730(1989). 

[06791 GTP-binding elongation factors signature (GTP_EFTU2) . , r.,«i,»r« 

Elongation factors [1 ,2] are proteins catalyzing the elongation of peptide chains in protein biosynthesis, n both proka^- 
otes and eukaryo es. there are three distinct types of elongation factors, as descnbed in the fo"°w.ng table^ 

^ ^ Eukaryotes Prokaryotes Function 

- EF-1 alpha EF-Tu Binds GTP and an aminoacyl-tRNA; deliv- 



40 



45 



50 



55 



ers the latter to the A site of ribosomes. EF-lbeta EF-Ts Interacts with EF-la/EF-Tu to displace ^DP and thus a„ows 
30 the reaeneration of GTP-EF-la. EF-2 EF-G Binds GTP and peptidyl-tRNA and translocates the latter from the A site 
30 the regeneration or v^ir GTP-binding elongation factor family also 

ncludes't'he following proteins: - Eukaryotic peptide chain release factor GTP-binding subunits [3V These proteins 
interact with release factors that bind to ribosomes that have encountered a stop J/^^^^f.^"^^^^^^^^ 
them to induce release of the nascent polypeptide. The yeast protein was known as SUP2 ^^^fj^^^^'^.^^J^^ 
35 or GST1) and the human homolog as GSTI-Hs. - Prokaryotic peptide chain release factor 3 (RF-3) (gene prfC) RF- 
3 is a class-Il RF a GTP-binding protein that interacts with dass I RFs (see <PDOC00607>) and enhance their activity 
?4 Prokaryoti^ GTP-binding'pLein lepA and its homolog in yeas, (gene GUF1) ^aenorhabdms e e^^^^^^^ 

ZK1236 1) - Yeast HBS1 [5]. - Rat statin 81 [6], a protein of unknown function which is highly similar to EF-lalpha. - 
Prokaryotic selenocysteine-specific elongation factor selB [7]. which seems to replace EF-Tu for the insertion o^ s^ 
ienoSeine directed by the UGA codon. - The tetracycline resistance proteins tetMAetO [8,9] from vario^ ba^ena 
suchLcampylobacter jejuni. Enterococcusfaecalis.Streptococcusmutans and Ureaplasmaurealyt,cum.Te^a^^^^^^^ 

Wnds to the prokaryotic ribosomal 308 subunit and inhibits binding of aminoacyl-tRN/^^These Pjotems a^ol^ the 
inhibitory effect of tetracycline on protein synthesis. - Rhizobium nodulation protein nodQ [1 0]. - Eschenchia «) i hy- 
pothetic^al p«,tein yihK [11]..n EF-1-a.pha. a specific region has been shown I12]|to ^« '"^'^^ ^^^^^^^^^ 
change mediated by the hydrolysis of GTP to GDR This region is conserved in both EF-lalpha/EF-Tu as well as EF 
SS-G and thus seems typical for GTP-dependent proteins which bind non-initiatortRNAstothenbosome. The pattem 

develooed for this family of proteins include that conserved region. 

S consensus pattern D-IKR8TGANQFYW]-x(3)-E-[KRAQ]-x-[RKQDHGCl-[IVMKl-[ST> [lVl-x(2HGSTACK- 
RNQ]- 

[ 1] Concise Encyclopedia Biochemistry. Second Edition, Walter de Gruyter. Berlin New-York (1988). 
[ 21 MoldaveK.Annu. Rev. Biochem. 54:1109-1149(1985). „ u,,- e w Kii»r,a= 

3 Stensfield I., Jones K.M.. Kushnirov V.V.. Dagkesamanskaya A.R.. Poznyakovski A.I.. Paushkin S.V.. Nierras 
C.R., Cox B.8.. Ter-Avanesyan M.D., Tuite M.F. EMBO J. 1 4:4365-4373(1 995^ o^n iosQ.; 10600 

[ 4] Grentzmann G.. Brechemier-Baey D., Heurgue-Hamard V.. Buckingham R.H. J. Biol. Chem. 270:10595-10600 

[irSison R.J.. Ziegelhoffer T. Nicolet C, Wemer-Washbume M., Craig E^A Cell 71:97-105(1992) . 
1 6[ Ann D.K., Moutsateos I.K., Nakamura T., Lin H.H.. Mao P.-L.. Lee M.-J.. Chin S., Liem R.K.H.. Wang E. J. Biol. 
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Chem. 266:10429-10437(1991). 

[ 7] Forchammer K.. Leinfeldr W., Bock A. Nature 342:453-456(1989). 
[ 8] Manavathu E.K., Hiratsuka K.. Taylor D.E. Gene 62:17-26(1988). 

[ 9] Leblanc D.J.. Lee L.N., Titmas B.M., Smith C.J., Tenover F.C. J. Bacteriot. 170:3618-3626(1988). 
s [10] Cervantes E., Sharma S.B., Marliet F., Vasse J., Truchet G.. Rosenberg C. Mol. Microbiol. 3:745-755(1989). 

[11] Plunkett G. III. Buriand V.D.. Daniels D.L.. Blattner F.R. Nucleic Acids Res. 21:3391-3398(1993), 
[12] Moller W., Schipper A., Amons R. Biochimie 69:983-989(1987). 

[0681] 228. GTP cyclohydrolase II. 
10 GTP cyclohydrolase II catalyses the first committed step in the biosynthesis of riboflavin. 

[0682] [1] Richter G, Ritz H. Katzenmeier G, Volk R. Kohnle A, Lottspeich F, Allendorf D. Bacher A. J Bacteriot 1993; 
175:4045-4051. 

[0683] 229. Galactose- 1 -phosphate uridyl transferase signatures (GalP_UDP_transf) 

Galactose- 1 -phosphate uridyl transferase (EC 2.7.7.10 ) (galT) catalyzes the transfer of an uridyldiphosphate group on 
15 galactose (or glucose) 1 -phosphate. During the reaction, the uridyl moiety links to a histidine residue. In the Escherichia 
coli enzyme, it has been shown [1] that two histidine residues separated by a single proline residue are essential for 
enzyme activity. On the basis of sequence similarities, two apparently unrelated families seem to exist. Class-I enzymes 
are found in eukaryotes as well as some bacteria such as Escherichia coli or Streptomyces lividans, while class-ll 
enzymes have been found so far only In bacteria such as Bacillus subtilis or Lactobacillus helveticus [2]. Signature 
20 patterns for both families were developed. For class-l enzymes the signature is based on the active site residues. For 
class-ll enzymes a region which also includes two conserved histidines was chosen. 
Consensus pattern: F-E-N-[RK]-G-x(3)-G-x(4)-H-P-H-x-Q [The two H*s are the active site residues]- 
[0684] Consensus pattern: D-L-P-I-V-G-G-[ST]-[LIVM](2)-[SA]-H-[DEN]-H-[FY]-Q-G-G- Note: class-l enzymes are 
structurally related to the HIT family of proteins (see <PDOC00694 

25 

1 1] Reichardt J.K.V.. Berg P. Nucleic Acids Res. 16:9017-9026(1988). 
[ 2] Mollet B., Pilioud N. J. Bacteriol. 173:4464^73(1991). 

[0685] 230. Gamma-thionins family signature 
30 [0686] The following small plant proteins are evolutionary related: 

Gamma-thionins from wheat endosperm (gamma-purothionins) and barley (gamma- hordothionins) which are toxic 
to animal cells and inhibit protein synthesis in cell free systems [1]. 
A fiower-specific thionin (FST) from tobacco [2]. 
35 . Antifungal proteins (AFP) from the seeds of Brassicaceae species such as radish, mustard, turnip and Arabidopsis 

thaliana [3]. 

Inhibitors of insect alpha-amylases from sorghum [4]. 
Probable protease inhibitor P322 from potato. 
A germination-related protein from cowpea [5]. 
<o - Anther-specific protein SF1 8 from sunflower [6]. SF18 is a protein that contains a gamma-thionin domain at its N- 
terminus and a proline-rich C- terminal domain. 
Soybean sulfur-rich protein SE60 [7]. 
Vicia faba antibacterial peptides fabatin-1 and -2. 

45 [0687] In their mature fomi, these proteins generally consist of about 45 to 50amino-acid residues. As shown in the 
following schematic representation, these peptides contain eight conserved cysteines involved in disulfide bonds. 

+ +1^ .^lllll 

^ xxCxxxxxxxxxxCxxxxxCxxxCxxxxxxxxxCxxxxxxCxCxxxC *' 

|+_.| +|H + 

'C: conserved cysteine involved in a disulfide bond. 
'* position of the pattern. 

[0688] Consensus pattern: [KRG]-x-C-x(3)-[SV]-x(2)-[FYWH]-x-[GF]-x-C-x(5)-C-x(3)-C [The four C's are involved in 
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disulfide bonds]- 

[1] Bruix M.. Jimenez M.A.. Santoro J., Gonzalez C. Colilla F.J., Mendez E.. Rico M. Biochemistry 32:715-724 

m Gu Q., Kawata E.E.. Morse M.-J.. Wu H.-M.. Cheung A.Y. Mol. Gen. Genet. 234:89-96(1992). 

[3] Terras F.R.G., Torrekens S.. van Leuven F.. Osborn R.W.. Vanderleyden J.. Cammue B.P.A.. Broekaert W.F. 

FEBS Lett. 316:233-240(1993). 

[4] Bloch C. Jr., Richardson M. FEBS Lett. 279:101-104(1991). 

[5] Ishibashi N., Yamauchi D.. Miniamikawa T. Plant Mol. Biol. 15:59-64(1990). 

[7] Choi Y, Choi YD., Lee J.S. Plant Physiol. 101:699-700(1993). 

[06891 231. Gelsolin. Gelsolin repeat. Number of members: 170 , ^ ^r.A 

0690 [llMedline: 97433077. The crystal structure of plasma gelsolin: implications for art-n ^^^tZ^^"^^^^^ 
nucleation. Burtnick LD, Koepf EK. Grimes J, Jones EY Stuart Dl. McLaughlin PJ. Robmson RC; Cell 1997.90.661-670. 
r06911 232. Germin family signature . 
Germ ns [1] are a family of homopentameric cereal glycoprotems expressed during germmation wh.ch may P^V a rote 
I Znng the properties of cell walls durir,g germinative growth. It has been shown that wheat and ^arleygenn ns art 
as oxalate oxidases (EC 1.2.3.4). an enzyme that catalyzes the oxidative degradation of oxalate to carbonate and 
hydrogen peroxide. GeHnUi^ highly similar to: - Germin^ike proteins from various plants such as ^ape. v'ole or 
wSte mustard - Slime mold spherulins la and 1b which are proteins that accumulate specifically dunng spherulat on. 
TpJ^cess induced by various forms of environmentel stress which teads to encystment and domiancy. As a signature 
pattern the best conserved region was selected: a decapeptide located in the central section of these proteins. 
[0692] Consensus pattem: G-x(4)-H-x-H-P-x-A-x-E-[LIVM]- 
[0693] [1] Lane B.G. FASEB J. 8:294-301(1994). 
[0694] 233. (GlutR) 

foeTsriXarotvTnr^^^^^^^ 

phyrin derivatives such as chlorophyll and heme. ALA can be synthesized via two different Pf 
11) pathway which involves the single step condensation of succinyl-CoA and gly«ne and '^^^^f^^^^ 
synthase (EC 2.3.1 .37) and via the CSpathway from the five-carbon skeleton of glutamate. The C5 pathway operates 
in the chloroDlast of plants and algae. In cyanobacteria, in some eubacteria and in archaebactena. 

oSer^^^^^^^^ sLp in the Cs'pathway is carried out by g.utamyMRNA reductase (GluTR) [1] wh'Ch 
NADP-dependent conversion of glutamate- tRNA(Glu) to glutamate-1-semialdehyde (GSA) with the concomitant re- 
lease of tRNA(Glu) which can then be recharged with glutamate by glutamyl-tRNA synthetase. 
[06971 GluTR is a protein of about 50 Kd (467 to 550 residues) which contains a few conserved region. The best 
conserved region is located in positions 99 to 122 in the sequence of known GluTR. This region seems important for 
the activity of the enzyme. We have developed a signature pattem from that conserved region 

06987 Consensus 

[IV1-[LIT]-[STAG]-Q-[LIVM]-[KR] Sequences known to belong to this class detected by the pattem ALL. 
[0699] [1] Jahn D., Verkamp E., Soell D. Trends Biochem. Sci. 17:215-218(1992). 
[0700] 234. (Glycoprotease) 

Glycoprotease family signature (aka Peptidase_M22) ^^^r^*^H 
[0701] Glycoprotease (GCP) (EC 3.4.24.57) [1]. or o-syaloglycoprotein endopeptidase. is a metalloprotease secreted 
by Pasteurella haemolytica which specifically cleaves O-sialoglycoproteins such as glycophorin A. The sequence of 
GCP is highly similar to the following uncharacterized proteins: 

Escherichia coli hypothetical protein ygjD (ORF-X). 
Bacillus subtilis hypothetical protein ydiE. 
Mycobacterium leprae hypothetical protein U229E. 
Mycobacterium tuberculosis hypothetical protein MtCY78.10. 

- Synechocystis strain PCC 6803 hypothetical protein slr0807. 

- Methanococcus jannaschii hypothetical protein MJ1130. 

- Haloarcula marismortui hypothetical protein in HSH 3* region. 
Yeast hypothetical protein YKR038c. 

Yeast hypothetical protein QRI7. 

[0702] One of the conserved regions contains two conserved histidines. It is possible that this region is involved in 
coordinating a metal ion such as zinc. 
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[0703] Consensus pattemIKR]-[GSAT|-x(4)-[FYWLH]-[DQNGK]-x-P.x-[LIVMFY]-x(3)-H-x(2)-[AG]-H-[LIVM] Se- 
quences known to belong to this class detected by the pattern ALL. 

[0704] Note: these proteins belong to family M22 In the classification of peptidases [2.E1]. 

5 [ 1] Abdullah K.M.. Lo R.Y.C.. Mellors A. J. Bacteriot. 173:5597-5603(1991). 

[ 2] Rawlings N.D.. Barrett A.J. Meth. Enzymol. 248:183-228(1995). 

[0705] 235. (Glucosamine.lso) 

Giucosamine/galactosamine-6-phosphate isomerases signature 

10 Glucosamine-6-phosphate isomerase (EC 5.3.1.10) (or Glc-6-P deaminase) is the enzyme responsible for the conver- 
sion of glucosamine 6-phosphate into fructose6 phosphate [1]. It is the last specific step in the pathway for N-acetylglu- 
cosamine (GlcNAC) utilization in bacteria such as Escherichia coli (gene nagB) or in fungi such as Candida albicans 
(gene NAG1).GIc-6-P isomerase is evolutionary related to: - A putative Escherichia coli galactosamine-6-phosphate 
isomerase (gene agal) [2]. - Escherichia coli hypothetical protein yieK. - Bacillus subtilis hypothetical protein ybfT. As 

15 a signature pattern a conserved region located in the central part of these enzymes was selected. This region contains 
a conserved histidine which has been shown [1], in nagB, to be important for the pyranose ring-opening step of the 
catalytic mechanism 

[0706] Consensus pattern: [LIVM]-x(3)-G-x-[LITl-x-[LIV]-x-[LIVM]-x-G-[LIVM]-G-x- [DEN]-G-H- 

20 1 1] Oliva G., Pontes M.R.M.. Gan-att R.C., Altamirano M.M., Calcagno M.L.. Horjales E. Structure 3:1323-1332 

(1995). 

[2] Reizer J., Ramseier T.M., Reizer A., CharbitA., SaierM.H. Jr. Microbiology 142:231-250(1996). 

[0707] 236. Pneumovirus attachment glycoprotein G (glycoprotein G) 

25 [0708] This family includes attachment proteins from respiratory synctial virus. Glycoprotein G has not been shown 
to have any neuraminidase or hemagglutinin activity (Swiss-Prot). The amino terminus is thought to be cytoplasmic, 
and the carboxyl terminus extracellular. The extracellular region contains four completely conserved cysteine residues. 
[0709] [1] Johnson PR. Spriggs MK. Olmsted RA, Collins PL. Proc Natl Acad Sci U S A 1987;84:5625-5629. 
[0710] 237. Glycosyl transferases group 1 

30 [0711] Mutations in this domain of Swiss: P37287 lead to disease (Paroxysmal Nocturnal haemoglobinuria). Members 
of this family transfer activated sugars to a variety of substrates, including glycogen Fructose-6-phosphate and lipopol- 
ysaccharides. Members of this family transfer UDP, ADP, GDP or CMP linked sugars. The eukaryotic glycogen syn- 
thases may be distant members of this family. 
[0712] 238. Glycosyl transferases (Glycos_transf_2) 

35 [0713] Diverse family, transferring sugar from UDP-glucose, UDP-N-acetyl-galactosamine, GDP-mannose or CDP- 
abequose, to a range of substrates including cellulose, dolichol phosphate and teichoic acids. 
[0714] 239. (GlucosJransf_3) J 
Thymidine and pyrimidine-nucleoside phosphorylases signature 

[0715] Thymidine phosphorylase (EC 2.4.2.4) catalyzes the reversible phosphorolysis of thymidine, deoxyuridine 
40 and their analogues to their respective bases and 2-deoxyribose 1 -phosphate. This enzyme regulates the availability 
of thymidine and is therefore essential to nucleic acid metabolism. 

[0716] In Escherichia coli (gene deoA). the enzyme is a dimer of identical subunits of about 48 Kd [1]. In humans it 
was first identified as platelet-derived endothelial cell growth factor (PD-ECGF) [El] before being recognized [2] as 
thymidine phosphorylase. 

45 [0717] Bacterial pyrimidine-nucleoside phosphorylase (EC 2 A. 2.2) (gene pdp) [3] is an enzyme evolutionary and 
structurally related to thymidine phosphorylase. 

[0718] A a well conserved region of 19 residues located in the N-terminal part of these proteins signature pattern for 
these enzymes was selected. 

[0719] Consensus patternS-[GS]-R-[GA]-[LIV]-x(2)-rrA]-[GA]-G-T-x-D-x-[LIV]-E Sequences known to belong to this 
50 class detected by the pattern ALL. 

[ 1] Waiter M.R., Cook W.J., Cole L.B.. Short S.A.. Koszalka G.W., Krenitsky T.A.. Ealick S.E. J. Biol. Chem. 265: 
14016-14022(1990). 

[ 2] Furukawa T Yoshimura A.. Sumizawa T, Haraguchi M.. Akiyama S.-l.. Fukui K., Yamada Y. Nature 356:668-668 
55 (1992). 

( 3] Saxild H.H.. Andersen L.N.. Hammer K. J. Bacteriol. 178:424-434(1996). 
[0720] 240. Glycos_transf_4. Glycosyl transferase. Number of members: 44. 
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[0721] (llMedrine:95252686.AfamilyofUDP-GlcNAc/MurNAc:polyisoprenol-PGIcNAc/MurNAc-1-Ptransferases. 
[0722] 241 . Glycosyl hydrolases family 1 5. 21 members. 

[07231 242. Glycosyl hydrolases family 16 signature . , , , .u^ k^oIo r.f 

t has been shown [11 that the following glycosyl hydrolases can be classified Into a single family on the basts of 
e quence i«i- - Bacterial beta-1>l'4-glucanases. or lichenases. i^CZ^ mainly from Ba«l.us but a^ 

Jm Clostridium thermocellum (gene licB). Fibrobacter succinogenes and 

cuius circulans beta-1 ,3-glucanase A1 (EC 3.2.1.39 ) (gene gIcA). - Lamarinase (EC 3^^A) from Clostnd.um theirn^ 
clmTgene lami). - sfreptomyces coelic^i^Tii-arase (EC 3^) (gene dagA). - Alteromonas ca-aeenovoj 
kappa-carrageenase (EC 3.2.1.83) (gene cgkA).Two closely clustered consen/ed glutamates have been shown [2] to 
SroWe3hecatalytici5i^fLcllluslichen^^^^ 

[O^M?"jrse'nsrpattern: E-[LlV]-D-[LIV]-x(0.1)-E-x(2)-[GQHKRNFl-x-(PSTA] [The two E's are active site resi- 
dues]- 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). ^>.K,c,^QQA^ 
[ 2] Juncosa M.. Pons J.. Dot T. Querol E.. Planas A. J. Biol. Chem. 269:14530-14535(1994). 

r0725] 243. Glycosyl hydrolases family 17 signature ..... ,k k„=io /,f 

t has been shown [1,2] that the following glycosyl hydrolases can be classified into a single family on the basis of 
equencesiSlaritieL-L^ 

plants This enzyme may be involved in the defense of plants against pathogens through its ability to degrade fungal 
iTwall polysaccharides - Glucan 1 ,3-beta^lucosidase (EC^J^ (exo-(1->3)-beta-g^ucanase) lrom j^ast (gene 
VgV2) This enzyme may play a role in cell expansion during growth, in celKell fusion dunng mating, and .n spore 
Se during Sorulation.'- Uchenases (ECMJJl) (endo-(1->3,1->4)-beta-glucanase) from vanous p^nte^ T^e 
b^t conserved region in the sequence of these enzymes is located in their central section. Th's region contains a 
c^s^ed tryptophan residue which could be involved in the interaction with the glucan ^"ffr-*^!,^ and rt also 
contains a oonsen/ed glutamate which has been shown [3] to act as the nudeophile .n the catalytic mechanism, this 

o!Z:ZXZ tXTuVMF;;,AK3)-[STAG]-E-[STA]-G-W-P-[S^^^^^^^ [E is an ac«ve s«e residue,- 

[ 11 Henrissat B. Biochem. J. 280:309-316(1991). 

2 Ori N.. Sessa G., Lotan T. Himmelhoch S., Fluhr R. EMBO J. 9:3429-3436(1990). ^ oi- 

ls] vighese J.N.. Garrett T.P.J.. Colman RM.. Chen L.. Hoj RJ.. Fincher G.B. Proc. Natl. Acad. Sci. U.S.A. 91. 
2785-2789(1994). 

S;oS,afe"(ET4^lT);S^^^^ 

of methylglyoxal aWutathioneinto S-lactoylglutathione which is then converted by glyoxalase II ^ [U 

Glyoxalai I is an ubiquitous enzyme which binds one mole of zinc per subunit. The bactenal and V^^^^^^^ 

monomeric while the mammalian one is homodimeric. The sequence of glyoxalase I is well ^^^^^^^^^^ 

Ilammals the enzyme is a protein of about 1 30 to 1 80 residues while in fungi it is about twice longer. In these organisms 

S^fenSme fe bS out of L tandem repeat of an homologous domain. Two signature patterns «Jam'^^^^^^ 

derived The first one is located in the N-tem,inaI region while the second one is located ,n the central section of the 

protein and contains a conserved histidine that could be implicated in the binding of the zinc atom^ 

[0727] Consensus pattern: [HQl-[IVT^-x-[LIVFY^x-[lVl-x(5)-[STA]-x(2)-F-[Y^^x(2.3HLM^^ 

Lnsensus pattern: G-[NTKQ]-x(0.5)-[GA]-[LVFY]-[GH]-H-[lVF].[CGAl-x-PY.GL^^^^^^^ 

[0728] [ 1] Kim N.-S.. Umezawa Y.. Ohmura S.. Kato S. J. Biol. Chem. 268:11217-11221(1993). 

[0729] 245. (Glypican) 

SSSHs VT^s family of heparan sulfate proteoglycans which are anchored to cell membranes by a glycosyl- 
phosphafidylinositol (GPI) linkage. Structurally, these proteins consist of three separate domains: 

bl ^n'SiXr'lSmain of about 500 residues that contains 12 conserved cysteines probably involved in disulfide 
bonds and which also contains the sites of attachment of the heparan sulfate ajV^sammog yean side ch^^^ 
c) A C-terminal hydrophobic region which is post-translationally removed after formation of the GPI-anchor. 
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[0730] The proteins known to belong to this family are: 

- Glypican 1 (GPC1). 

Glypican 2 (GPC2) or cerebroglycan. 
5 - Glypican 3 (GPC3) or OCI-5. In man, defects in GPC3 are the cause of a X-linked genetic disease, Simpson- 
Galabi-Behmel syndrome (SGBS). 
K-glypican. 
Glypican 5 (GPC5). 
Drosophiia protein dally. 

10 

[0731] The signature pattern that was developed for glypicans is located in the central section of the extracellular 
domain and contains five of the conserved cysteines. 

[0732] Consensus patternC-x(2)-C-x-G-[LIVM]-x(4)-P-C-x(2)-tFY]-C-x(2)-[LIVMl-x(2)- G-C [The C's are probably in- 
volved in a disulfide bonds] Sequences known to belong to this class detected by the pattern ALL, except for dally. 

15 

1 1] Weksberg R.. Squire J.A.. Templeton D.M. Nat. Genet. 12:225-227(1996). 
[ 2] Watanabe K., Yamada H., Yamaguchi Y. J. Cell Biol. 130:1207-1218(1995). 

[0733] 246. Granins signatures 

20 Granins (chromogranins or secretogranins) [1] are a family of acidic proteins present in the secretory granules of a 
wide variety of endocrine and neuro-endocrine cells. The exact function(s) of these proteins is not yet known but they 
seem to be the precursors of biologically active peptides and/or they may act as helper proteins in the packaging of 
peptide hormones and neuropeptides. Three members of this family of proteins show some sequence similarities: - 
Chromogranin A (CGA) [2]. CGA is a protein of about 420 residues: it is the precursor of the peptide pancreastatin 

25 which strongly inhibits glucosennduced insulin release from the pancreas. - Secretogranin 1 (chromogranin B). A sul- 
fated protein of about 600 residues. - Secretogranin 2 (chromogranin C). A sulfated protein of about 650 residues. 
Apart from their subcellular location and the abundance of acidic residues(Asp and Glu), these proteins do not share 
many structural similarities. Only one short region, located in the C-terminal section, is conserved in all these proteins. 
Chromogranins A and B share a region of high similarity in their N-terminal section; this region includes two cysteine 

30 residues involved in a disulfide bond 

[0734] Consensus pattern: [DE]-[SN]-L-[SAN]-x(2)-[DE)-x-E-L- 

Consensus pattern: C-[LI VM](2)-E-[LI VM](2)-S-[DN]-[STAl-L-x-K-x-S-x(3)- [LIVM]-[STA]-x-E-C [The two C's are linked 
by a disulfide bond]- 

35 1 1] Huttner W.B., Gerdes H.-H.. Rosa P. Trends Biochem. Sci. 16:27-30(1991). 

[ 2] Simon J.-P., Aunts D. Biochem. J. 262:1-13(1989). 

[0735] 247. grpE protein signature 

In prokaryotes the grpE protein [1] stimulates, jointly with dnaJ, the ATPase activity of the dnaK chaperone. It seems 
40 to accelerate the release of ADP from dnaK thus allowing dnaK to recycle more efficiently. GrpE is a protein of about 
22 to 25 Kd. In yeast, an evolutionary related mitochondrial protein(gene GRPE) has been shown [2] to associate with 
the mitochondrial hsp70protein and to thus play a role in the import of proteins from the cytoplasm. As a signature 
pattern, the most conserved region of grpE was selected. It is located in the C-terminal section. 
[0736] Consensus pattern: [FL]-(DNl-[PHEA]-x(2)-[HM]-x-A-[LIVMTN)-x(16.20)-G-[FY]- x(3)-[DEG]-x(2)-[LIVM]- 
45 (RI]-x-[SA]-x-V-x-[IVl- 

[ 1] Georgopoulos C. Welch W. Annu. Rev Cell Biol. 9:601-635(1993). 

[ 2] Bolliger L., Deloche O., Glick B.S., Georgopoulos C, Jenoe P., Kronidou N.. Horst M., Morishima N., Schatz 
G. EMBO J. 13:1998-2006(1994). 

50 

[0737] 248. Guanylate kinase signature and profile 

Guanylate kinase (EC 2.7.4.8 ) (GK) [1] catalyzes the ATP-dependent phosphorylation of GMP into GDP. It is essential 
for recycling GMP and indirectly, cGMP. In prokaryotes (such as Escherichia coll), lower eukaryotes (such as yeast) 
and in vertebrates, GK is a highly conserved monomeric protein of about 200 amino acids. GK has been shown [2,3,4] 
55 to be structurally similar to the following proteins: - Protein A57R (or SalG2R) from various strains of Vaccinia virus. 
This protein is highly similar to GK, but contains a frameshift mutation in the N-terminal section and could therefore be 
inactive in that virus. The following proteins are characterized by the presence in their sequence of one or more copies 
of the DHR domain, a SH3 domain (see <PDOC50002 > as well as a C-terminal GK-like domain, these protein are 
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collectively termed MAGUKs (membrane^ssociated guanylate kinase homologs) [5): - Drosophila l^t^^^'C)^'^^ '^/ST 
?!!ZZ7^r:^.r.r nrntPin ^nene dial ). This protein is associated with septate junctions in developing fl^s and defects 
InThrdlgrg^ne'^use neoprastic overgrowth of the imaginal disks. - Mammalian tight junction protem ^o- m .am„y 
of J^SmmS s^aptic proteins that seem to interact with the cytoplasmic tail of NMDA receptor s"b"n.te. Th.s farn^ 
cu^y Snsist of SAP90/PSD-95, CHAPSYN-110/PSD-93. SAP97/DLG1 and SAP102 - Vertebrate 55 Kd enjro- 
c^e membrane protein (p55). p55 is a paimitoylated, membrane-associated P^Jf J^"""^-^^^^^ 
dL elegans protein lin-2, which may play a structural role in the induction of the vulva - f^at P™*®'" 'l^^'l ""^^^ 
DS^n^jG2 Human p otein DLGS.There is an ATP-binding site (P-loop) in the N-temiinal section of GK. This region 
protein ^Jf f; P ^^ve proteins which are therefore unlikely to be kinases. However these 

pr^teSetrhrrl^^^^^^^^ 

region was selected that conteins two arginine and a tyrosine which are invoW^ed G^P-binding 
[0738] Consensus pattern: T-[S-n-R-x(2)-[KR]-x(2HDE].x(2)-G-x(2)-Y-x-[FYHLIVMKl- 

[ 1] Stehle T, Schuiz G.E. J. IVIol. Biol. 224:1127-1141(1992). 
[2] Bryant P.J., Woods D.F. Cell 68:621-622(1992). 

[ 31 Goebl M.G. Trends Biochem. Sci. 17:99-99(1992). 

14] Zschocke P.D.. Schiltz E.. Schuiz G.E. Eur. J. Biochem. 213:263-269(1993). 
( 5] Woods D.F., Bryant P.J. Mech. Dev. 44:85-89(1994). 

[0739] 249. (Glyco_hydro_35) 

J^ZngToTlS gb^^^^^^^^^ and glycosaminoglycans and whose deficiency is the cause of the genetic disease 

SI 'TnlTS:?ei^3^rver^^^^^ enzymes conteins a g.utemic acid residue which, on the basis of 

Sle? wS. oLrtmTes of glyco'sy. hydrolases [4], probably acts as the proton donor in the cata^rtic mechanism. 

[0^4^ Co:«a;:rS'^^^^^^ second E is the putetive active s.e .sidue, 

Sequences known to belong to this class detected by the pattern ALL. 

r 11 Taron C H Bennsr J S Hornstra LJ., Guthrie E.P. Glycobioiogy 5:603-610(1995). 

1 S cZ pir.: HoiJk . Pilrd S.. Wilde R., Tucker G.A., Bird C.R.. Schuch W., Seymour G.B. Plant Physiol. 108: 
1099-1107(1995). 

[ 3] Henrissat B., Bairoch A. Biochem. J. 293:781-788(1993). 

[4! Henrissat B., Gallebaut L. Fabrega S.. Lehn P.. Mornon J.-P., Davies G. Proc. Natl. Acad. Sc.. U.S.A. 92. 
7090-7094(1995). 

[0744] 250. (Giyco_hydro_16) 

fo?4T .t h^Kn SmSe'oHowing glycos^ hydrolases can be Cass.ed inte a single family on the basis 
of sequence similarities: 

. Bacterial bete-1,3-1.4.glucanases. or lichenases. (EC 3.2.1.73) mainly fram Bacillus but also from Clostridium 

thermocellum (gene licB), Fibrobacter succinogenes and Rhodothermus mannus (gene bgIA). 
. Bacillus circulans bete-1 ,3-glucanase A1 (EC 3.2.1 .39) (gene gIcA). 

- Lamarinase (EC 3.2.1 .6) from Clostridium themnocellum (gene Iam1). 
. Streptomyces coelicolor agarase (EC 3.2. 1 .81 ) (gene dagA). 

- Alteromonas can^geenovora kappa-can-ageenase (EC 3.2. 1 .83) (gene cgkA). 

[0746] TWO dosely clustered conserved glutemates have been shown [2] to be '"^'f 
Bacillus licheniformfe lichenase. The region that contains these residues as a signature pattern was used^ 

SZ^s pattern E-[LIV]-D-IUV]-x(0.1)-E-x(2HGQ]-[KRNF]-x-[PSTA] [The two E's are active site residues] 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). .«-,«iQa*\ 

12] Juncosa M.. Pons J.. Dot T. Querol E., Planas A. J. Biol. Chem. 269:14530-14535(1994). 
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10748] 251. (Glyco^hydro_17) 
Glycosyl hydrolases family 17 signature 
(aka glycosyl_hydro4) 

[0749] It has been shown [1,2] that the following glycosyl hydrolases can be classified into a single family on the 
5 basis of sequence similarities: 

Glucan endo-1 ,3-beta-glucosidases (EC 3.2.1 .39) (endo-(1->3)-beta-glucanase) from various plants. This enzyme 
may be involved in the defense of plants against pathogens through its ability to degrade fungal cell wall polysac- 
charides. 

10 - Glucan 1 ,3-beta-glucosidase (EC 3.2.1.58) (exo-(1->3)-beta-glucanase) from yeast (gene BGL2). This enzyme 
may play a role in cell expansion during growth, in cell-cell fusion during mating, and in spore release during 
sporulation. 

Lichenases (EC 3.2.1.73) (endo-(1->3,1->4)-beta-glucanase) from various plants. 

15 [0750] The best conserved region in the sequence of these enzymes is located in their central section. This region 
contains a conserved tryptophan residue which could be involved in the interaction with the glucan substrates [2] and 
it also contains a conserved glutamate which has been shown [3] to act as the nucleophile in the catalytic mechanism. 
This region was used as a signature pattem. 

[0751] Consensus pattem [LIVM]-x-[LIVMFYWA](3)-[STAG]-E-[STA]-G-W-P-ISTN]-x-[SAGQ] [E is an active site res- 
20 idue] Sequences known to belong to this class detected by the pattem ALL. 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Ori N., Sessa G., Lotan T. Himmelhoch S., Fluhr R. EMBO J. 9:3429-3436(1990). 

[ 3] Varghese J.N., Garrett T.P.J., Colman P.M., Chen L., Hoj P.J., Fincher G.B. Proc. Natl. Acad. Sci. U.S.A. 91: 
25 2785-2789(1994). 

[0752] 252. (Glyco_hydro_3) 
Glycosyl hydrolases family 3 active site 

[0753] It has been shown [1 ,2] that the following glycosyl hydrolases can be, on the basis of sequence similarities. 
30 classified into a single family: 

Beta gtucosidases (EC 3.2.1 .21 ) from the fungi Aspergillus wentti (A-3), Hansenula anomala. Kluyveromyces fra- 
gllis, Saccharomycopsis fibuligera, (BGL1 and BGL2), Schizophyllum commune and Trichoderma reesei (BGL1). 
Beta glucosidases from the bacteria Agrobacterium tumefaciens (Cbgl ), Butyrivibrio fibrisolvens (bgIA), Clostrid- 
35 jum thermocellum (bgIB), Escherichia coli (t)glX), Erwinia chrysanthemi (bgxA) and Ruminococcus albus. 

Alteromonas strain 0-7 beta-hexosaminidase A (EC 3.2.1.52). 
Bacillus subtiiis hypothetical protein yzbA. 

Escherichica coli hypothetical protein ycfO and HI0959, the corresponding Haemophilus influenzae protein. 

40 [0754] One of the conserved regions in these enzymes is centered on a conserved aspartic acid residue which has 
been shown [3], in Aspergillus wentii beta- glucosidase A3, to be implicated in the catalytic mechanism. This region 
was used as a signature pattem. 

[0755] Consensus pattem[LIVM](2)-[KR]-x-[EQK]-x(4)-G-[LIVMFT]-[LIVT].[LIVMF]- [ST]-D-x(2)-[SGADNI] [D is the 
active site residue] Sequences known to belong to this class detected by the pattemALL. 

45 

1 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Castle L.A., Smith K.D., Moms R.O. J. Bacteriol. 174:1478-1486(1992). 
[ 3] Bause E., Legler G. Biochim. Biophys. Acta 626:459-465(1980). 

50 [0756] 253. (Glyco_hydro_28) 

Polygalacturonase active site (aka PG) 

[0757] Polygalacturonase (EC 3.2.1.15) (PG) (pectinase) [1,2] catalyzes the random hydrolysis of 1 ,4-alpha-D-ga- 
lactosiduronic linkages in pectate and other galacturonans. In fruit, polygalacturonase plays an important role in cell 
wall metabolism during ripening. In plant bacterial pathogens such as Erwinia carotovora or Pseudomonas 
55 solanacearum and fungal pathogens such as Aspergillus niger. polygalacturonase is involved in maceration and soft- 
rotting of plant tissue. 

[0758] Exo-poly-alpha-D-galacturonosidase (EC 3.2.1 .82) (exoPG) [3] hydrolyzes peptic add from the non-reducing 
end, releasing digalacturonate. 
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r07591 Prokaryotic, eukaryotic PG and exoPG share a few regions of sequence similarity. The best conserved of 
V . . u : f^r^H «n « ^ncorvAH histirtine most orobablv involved in the catalytic mechanism 

[?760] Consensus pattern[GSDENKRH].x(2)-[VMFC]-x(2HGSl-H-G-[LIVMAG]-x(1 .2)-lLIVM]-G-S [H is the putative 
active site residue] Sequences known to belong to this class detected by the pattemALL. 
[0761] Note: these proteins belong to family 28 in the classification of glycosyl hydrolases [5]. 

[ 11 Ruttowski E.. Labitzke R.. Khanh N.Q.. Loeffler F.. Gottschalk M., Jany K.-D. Biochim. Biophys. Acta 1087: 
104-106(1990). 

[2] Huang J.. Schell M.A. J. Bacteriol. 172:3879-3887(1990). 

[3] He S.Y., Collmer A. J. Bacteriol. 172:4988-4995(1990). 

[4] Bussink H.J.D., Buxton P.P., Visser J. Curr. Genet. 19:467-474(1991). 

[ 5] Henrissat B. Biochem. J. 280:309-316(1991). 

[0762] 254. (Glyco_hydro_32) 

GIvcosyl hydrolases family 32 active site • , * •■ 4k« 

[0763] It has been shown [1.2] that the following glycosyl hydrolases can be classified into a single family on the 

basis of sequence similarities: 

- inuiinase (EC 3.2.1.7) (or inulase) from the fungi Kluyveromycesmarxianus. 

- Beta-fructofuranosidase (EC 3.2.1 .26). commonly known as invertase in fungi and plants and as sucrase in bactena 

(gene sacA or scrB). oor^o 

- Raffinose invertase (EC 3.2.1 .26) (gene rafD) from Escherichia coli plasmid pRSD2. 

- Levanase (EC 3.2.1 .65) (gene sacC) from Bacillus subtilis. 

[07641 One of the conserved regions in these enzymes is located in the N-temaina! section and contains an aspartic 
acid residue which has been shown [3], in yeast invertase to be important for the catalytic mechanism. This region was 

Jo^ir Vonsen^s p^^^^^^ H-x(2)-P-x(4)-[LIVM]-N-D-P-N-G [D is the active site residue] Sequences known to belong 
to this class detected by the patternALL. 

[ 1] Henrissat B. Biochem- J. 280:309-316(1991). „ , . « * • i -ito-atot fiT'^i^ 

[ 2] Gunasekaran R. Karunakaran T. Cami B.. Mukundan A.G.. Preziosi L.. Baratti J. J. Bactenol. 172.6727-6735 

(1990). 

[ 3] Reddy V.A.. Maley F. J. Biol. Chem. 265:10817-10120(1990). 
[0766] 255. (Glyco_hydro_1) 

Glycosyl hydrolases family 1 signatures . 

[0767] It has been shown [1 to 4] that the following glycosyl hydrolases can be. on the basis of sequence similanties, 
classified into a single family: 

- Beta-glucosidases (EC 3.2.1 .21) from various bacteria such as Agrobacterium strain ATCC 21400. Bacillus poly- 
myxa, and Caldocellum saccharolyticum. 

- Two plants (clover) beta-glucosidases (EC 3.2.1.21). k^^o 

. Two different beta-galactosidases (EC 3.2.1.23) from the archaebacteria Sulfolobus solfatancus (genes bgaS and 

. 'e-phospho-beta-galactosidases (EC 3.2.1.85) from various bacteria such as Lactobacillus casei. Lactococcus lac- 

lis, and Staphylococcus aureus. ■- • • 

. 6-phospho-beta-glucosidases (EC 3.2.1 .86) from Escherichia coli (genes bgIB and ascB) and from Erwinia chry- 

santhemi (gene arbB). 

- Plants myrosinases (EC 3.2.3.1) (sinlgrinase)(thioglucosldase). ^„,„^„nro. 

. Mammalian lactase-phlorizin hydrolase (LPH) (EC 3.2.1.108/ EC 3.2.1.62). LPH. an .ntegral membrane glycopro- 
tein is the enzyme that splits lactose in the small intestine. LPH is a large protein of about 1900 residues which 
contains four tandem repeats of a domain of about 450 residues which is evolutionary related to the above glycosyl 
hydrolases. 

[0768] One of the conserved regions in these enzymes is centered on a conserved glutamic acid ;««due whi^ has 
been shown [5]. in the beta^lucosidase from Agrobacterium. to be directly involved in glycosidic bond cleavage by 
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acting as a nucleophile. This region was used as a signature pattern. As a second signature pattern we selected a 
conserved region, found in the N-terminal extremity of these enzymes, this region also contains a glutamic acid residue. 
[0769] Consensus pattem[LIVMFSTC]-[LIVFYSHLIVHLIVMST]-E-N-G-[LIVMFAR]-[CSAGN] [E is the active site 
residue] Sequences known to belong to this class detected by the patternALL. 

[0770] Note: this pattern will pick up the last two domains of LPH; the first two domains, which are removed from the 
LPH precursor by proteolytic processing, have lost the active site glutamate and may therefore be inactive [4]. 
[0771] Consensus patternF-x-[FYWM]-[GSTA]-x-[GSTAl-x-[GSTA](2)-[FYNH]-[NQ)-x-E-x-[GSTA] Sequences 
known to belong to this class detected by the pattern ALL. 
[0772] Note: this pattern will pick up the last three domains of LPH. 



[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 
[ 2] Henrissat B. Protein Seq. Data Anal. 4:61-62(1991). 
[ 3] Gonzalez-Candelas L.. Ramon D.. Polaina J. Gene 95:31-38(1990). 
[4] El Hassouni M.. Henrissat B., Chippaux M., Barras F. J. Bacteriol. 174:765-777(1992). 
15 [5] Withers S.G., Warren R.A.J., Street I. P.. Rupitz K., Kempton J.B.. Aebersold R. J. Am. Chem. Soc. 112: 

5887-5889(1990). 

[0773] 256. Glyco_hydro_20 
Glycosyl hydrolase family 20 
20 Previous Pfam IDs: glycosyl_hydr11; 
Number of members: 33 
[0774] 257. (GIyco_hydro_9) 
Glycosyl hydrolases family 9 active sites signatures 
(aka GlycosyLhydr12) 

25 [0775] The microbial degradation of cellulose and xylans requires several types of enzymes such as endogiucanases 
(EC 3.2.1.4), cellobiohydrolases (EC 3.2.1.91) (exoglucanases), or xylanases (EC 3.2.1.8) [1,2]. Fungi and bacteria 
produces a spectrum of cellulolytic enzymes (cellulases) and xylanases which, on the basis of sequence similarities, 
can be classified into families. One of these families is known as the cellulase family E [3] or as the glycosyl hydrolases 
family 9 [4,E1]. The enzymes which are currently known to belong to this family are listed below. 

30 

Butyrivibrio ftbrisolvens cellodextrinase 1 (cedl). 

Cellulomonas fimi endogiucanases B (cenB) and C (cenC). 

Clostridium cellulolyticum endoglucanase G (celCCG). 

Clostridium cellulovorans endoglucanase C (engC). 
35 . Clostridium stercoararium endoglucanase Z (avtcelase I) (celZ). 

Clostridium thermocelium endogiucanases D (celD), F (celF) and I (cell). 

Fibrobacter succinogenes endoglucanase A (endA). 

Pseudomonas fluorescens endoglucanase A (celA). 

Streptomyces reticuli endoglucanase 1 (cell). 
40 . Thermomonospora fusca endoglucanase E-4 (celD). 

Dictyostelium discoideum spore germination specific endoglucanase 270-6. This slime mold enzyme may digest 
the spore cell wall during germination, to release the enclosed amoeba. 

Endogiucanases from plants such as Avocado or French bean. In plants this enzyme may be involved the fruit- 
45 ripening process. 

[0776] Two of the most conserved regions in these enzymes are centered on conserved residues which have been 
shown [5.6], in the endoglucanase D from Cellulomonas thermocelium, to be important for the catalytic activity. The 
first region contains an active site histidine and the second region contains two catalytically important residues: an 
so aspartate and a glutamate. Both regions were used as signature pattems. 

[0777] Consensus pattern [STV]-x-[LIVMFY]-[STV]-x(2)-G-x-[NKR]-x(4)-[PLIVM]-H-x-R [H is an active site residue] 
Sequences known to belong to this class detected by the pattern ALL. except for Cellulomonas fimi cenC and Strep- 
tomyces reticuli cell. 

[0778] Consensus pattem [FYW]-x-D-x(4)-[FYW]-x(3)-E-x-[STAl-x(3)-N-[STA] [D and E are active site residues] Se- 
55 quences known to belong to this class detected by the pattern ALL, except for Fibrobacter succinogenes endA whose 
sequence seems to be incorrect 

[ 1] Beguin R Annu. Rev. Microbiol. 44:219-248(1990). 
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r 21 Gllkes N.R.. Henrissat B., Kilburn D.G.. Miller R.C. Jr.. Warren R.A.J. Microbiol. Rev. 55:303-315(1991). 
i *o M TommoP I Amfi.«;Ie L.. Momon J.-R Gene 81 :83-95(1 989). 

^ OJ l idiiiooei*. L.>.| wii-iwjw****.!*-. — - • — 

[41 Henrissat B. Biochem. J. 280:309-316(1991). 

Is! Tomme P., Chauvaux S.. Beguin P.. Millet J.. Aubert J.-P.. Claeyssens M. J. Biol. Chem. 266:10313-10318 
(1991). 

[6] Tomme R, van Beeumen J., Claeyssens M. Biochem. J. 285:319-324(1992). 

[0779] 268. Matrix protein (MA), pi 5 (GAG_ma) . 
[0780] The matrix protein, pi 5, is encoded by the gag gene. MA is involved in pathogenicity [1]^ 
[0781] [1] : Pozsgay JM. Beilharz MW. Wines BD. Hess AD. Pitha PM. J Virol 1993;67:5989-5999. 
[07821 259. Gag polyprotein. inner coat protein pi 2 (GAG_P1 2) 

[0783] The retroviral p12 is a virion structural protein. p12 Is proline rich. The function carried out by p12 in assembly 

and replication Is unknown. p12C is associated with pathogenicity of the virus 

[1] Pozsgay JM, Beilharz MW. Wines BD. Hess AD. Pitha PM. J Virol 1993;67:5989-5999. 

[0784] 260. Glutamine synthetase signatures (GLN-SYNT) . , •„„ .k^ 

Glutamine synthetase (EC 6.3.1.2) (GS) [1] plays an essential role in the metabolism of n> rogen by catalyzing the 
condensation of glutamate i^ldfi^^imonia to fonr, glutamine. There seem to be three different classes of GS t2.Ml- 
Class 1 enzymes (GSI) are specific to prokaryotes. and are oligomers of 12 identical subunits. The actmty o GSl-type 
enzyme is controlled by the adenylation of a tyrosine residue. The adenylated enzyme Is inactive. - Class II enzymes 
(GSM) are found in eukaryotes and in bacteria belonging to the Rhizobiaceae. Frankiaceae. and Streptornycetaceae 
families (these bacteria have also a class-l GS). GSII are octamer of identical subunits. Plants have two or more 
Sozlefof GSll one of the isozymes Is translocated into the chloroplast. - Class III enzymes (GSill) has. currently 
only been found in Bacteroldes fragllis and In butyrlvibrio fibrisolvens. It is a hexamer of identical chains It is much 
larger (about 700 amino acids) than the GSI (450 to 470 amino acids) or GSII (350 to 420 ammo aads) enzymes^ 
While the three classes of GS's are cleariy structurally related, the sequence similarities are not so extensive. As 
signature pattems three conserved regions were selected. The first pattern is based on a ^^"f^^^^^^f -"^Pf P''''.^^^ 
the N-terminal section of the enzyme, the second one is based on a glycine-rich region which is thought to be involved 
TaTP binding. The third pattern is specific to class I glutamine synthetases and includes the tyrosine residue which 

[oS'^'Sns^ensus partem: [FYWLl-D-G-S-S-x(6,8)-[DENQSTAK]-[SA]-[DE]-x(2)-[LIVMFY]- 

Consensus pattern: K-P-(LlVMFYA]-x(3,5)-[NPATl-G-[GSTAN]-G-x-H-x(3)-S- 

Consensus pattern: K-[LIVM]-x(5)-[LIVMA]-D-[RKl-[DN]-[LI]-Y [Y is the site of adenylation]- 

1 1] Eisenberg □.. Almassy R.J.. Janson C.A., Chapman M.S., Suh S.W., Cascio D., Smith W.W. Cold Spring 
Harbor Symp. Quant. Biol. 52:483-490(1987). ,„ ^„ ^ T^,^r.r. 

[ 2] Kumada Y., Benson D.R.. Hillemann D.. Hosted TJ.. Rochefort D.A.. Thompson C.J.. Wohlleben W.. Tateno 
Y. Proc. Natl. Acad. Sci. U.S.A. 90:3009-3013(1993). 
[ 31 Shatters R.G., Kahn M.L. J. Mol. Evol. 29:422-428(1989). 

1 4] Brown J.R., Masuchi Y, Robb F.T. Doolittle W.R J. Mol. Evol. 38:566-576(1994). 

[07861 261. Globins profile (globlnl) ... * i 

Globins are hemeK»ntaining proteins Involved In binding and/or transporting oxygen [1]. They belong to a very large 
and well studied family which is widely distributed in many organisms. The major groups of globins are: - Hemoglobins 
(Hb) from vertebrates. Hb is the protein responsible for transporting oxygen from the lungs to other tissues^ U is a 
tetramer of two alpha and two beta chains. Most vertebrate species also express specific embryonic or fetal foms of 
hemoglobin where the alpha or the beta chains are replaced by a chain with higher oxygen affinity^ as for t^e s^^^^ 
delta, epsilon and zeta chains in mammals, for example. - Myoglobins (Mg) from vertebrates^ Mg .s a "^ono^^^n^ 
protein responsible for oxygen storage in muscles. - Invertebrate globins [2]. A wide vanety o^S'^b-ns a^ found in 
invertebrates. Molluscs generally have one or two muscle globins which are either monomenc or ditnenc Insects, such 
as the mSe Chironomus thummi, have a large set of extracellular globins. Nematodes and annelids have a vanety 
of intracellular and extracellular globins; some of them are multi- domain polypeptides (fron. two "P t° 7!^°;^^^^ 
globins) and some produce large, disulfide-bonded aggregates. - Leghemoglobins (Lg) from the root nodules of egu- 
minous plants. Lg provides oxygen for bacteroids. - Flavohemoproteins from bacteria (Escherichia col. hmpA) and 
fungi [3]. These proteins consist of two distinct domains: an N-terminal globin domain and a C-temninal FA^KX^n" 
reductase domain. In bacteria such as Vitreoscilla. the enzyme-associated globin is a single P;^*^'"- 
globins seem to have evolved from a common ancestor. The profile developed to detect members of the globin family 
is based on a structural alignment of selected globin sequences x, ,..qo«w ^ nooHman m 

[ 1J Concise Encyclopedia Biochemistry. Second Edition. Walter de Gruyter. Berlin New-Yori. (1988).[ 2] Goodman M.. 
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Pedwaydon J., Czelusniak J., Suzuki T, Gotoh T., Moens L., Shishikura F., Walz D., Vinogradov S. J. Mol. Evol. 27: 
236-249(1988). 

[0787] Plant hemoglobins signature (globin2) 

Leghemoglobins [1] are hemoproteins present in the root nodules of leguminousplants. Leghemoglobins are structurally 
5 and functionally related to hemoglobin and myoglobin. By providing oxygen to the bacteroids, they are essential for 
symbiotic nitrogen fixation. Structurally related hemoglobins from the nodules of non-leguminous plants [2,3], and from 
the roots of non-nodulating plants[4] have been recently sequenced. A signature pattern was developed that picks up 
the sequence of plants hemoglobins, exclusively. 
[0788] Consensus pattern: [SN)-P-x-L-x(2)-H-A-x(3)-F- 

10 

[ 1] Powell R.. Gannon F. BioEssays 9:117-121(1988). 

[ 2] Kortt A.A.. Trinick M.J.. Appleby C.A. Eur. J. Biochem. 175:141-149(1988). 

[ 3] Kortt A.A., Inglis A.S.. Fleming A.I.. Appleby C.A. FEBS Lett. 231:341-346(1988). 

[ 4] Bogusz D., Appleby C.A., Landsmann J., Dennis E.S., Trinick M.J.. Peacock W.J. Nature 331:178-180(1988). 

15 

[0789] 262. Fructose-bisphosphate aldolase class-l active site (glycolytic^enz) 

[0790] Fructose-bisphosphate aldolase [1 ,2] is a glycolytic enzyme that catalyzes the reversible aldol cleavage or 
condensation of fructose-1 ,6-bisphosphate into dihydroxyacetone-phosphate and glyceraldehyde 3-phosphate.There 
are two classes of fructose-bisphosphate aldolases with different catalytic mechanisms. Class-1 aldolases [3]. mainly 
20 found in higher eukaryotes, are homotetrameric enzymes which form a Schiff-base intermediate between the C-2 
carbonyl group of the substrate (dihydroxyacetone phosphate)and the epsilon-amino group of a lysine residue. In 
vertebrates, three forms of this enzyme are found: aldolase A in muscle, aldolase B in liver and aldolase C in brain. 
The sequence around the lysine involved in the Schiff-base is highly conserved and can be used as a signature for 
this class of enzyme. 

25 [0791] Consensus pattern: [LIVM]-x-[LIVMFYW]-E-G-x-[LS]-L-K-P-[SN] [K is involved in Schiff-base fomiation]- 

[ 1] Perham R.N. Biochem. Soc. Trans. 18:185-187(1990). 

[ 2] Marsh J.J.. Lebherz H.G. Trends Biochem. Sci. 17:110-113(1992). 

[ 3] Freemont P.S., Dunbar B„ Fothergill-Gilmore L.A. Biochem. J. 249:779-788(1988). 

30 

[0792] 263. Glycosyl hydrolases family 11 active sites signatures 

The microbial degradation of cellulose and xylans requires several types of enzymes such as endoglucanases (EC 
3.2.1.4 ). celloblohydrolases (EC 3.2.1.91 ) (exoglucanases), or xylanases (EC 3.2.1.8 ) [1 ,2]. Fungi and bacteria pro- 
duces a spectrum of cellulolytic enzymes (cellulases) and xylanases which, on the basis of sequence similarities, can 

35 be classified into families. One of these families is known as the cellulase family G [3] or as the glycosyl hydrolases 
family 11 [4,E1]. The enzymes which are currently known to belong to this family are listed below. - Aspergillus awamori 
xylanase C (xynC). - Bacillus circuians, pumilus, stearothermophilus and subtilis xylanase (xynA). - Clostridium ace- 
tobutylicum xylanase (xynB). - Clostridium stercorarium xylanase A (xynA). - Fibrobacter succinogenes xylanase C 
(xynC) which consist of two catalytic domains that both belong to family 10. - Neocallimastix patriciarum xylanase A 

40 (xynA). - Ruminococcus flavefaciens bifunctional xylanase XYLA (xynA). This protein consists of three domains: a N- 
terminal xylanase catalytic domain that belongs to family 11 of glycosyl hydrolases; a central domain composed of 
short repeats of Gin. Asn an Trp. and a C-terminal xylanase catalytic domain that belongs to family 10 of glycosyl 
hydrolases. - Schizophyllum commune xylanase A. - Streptomyces lividans xylanases B (xInB) and C (xInC). - Tri- 
choderma reesei xylanases I and II. Two of the conserved regions in these enzymes are centered on glutamic acid- 

45 residues which have both been shown [5], in Bacillus pumilis xylanase, to be necessary for catalytic activity. Both 
regions were used as signature patterns. 

[0793] Consensus pattem: [PSAl-[LQ]-x-E-Y-Y-[LIVM)(2)-[DE]-x-[FYWHN] (E is an active site residue]- 
Consensus pattem: [LIVMF]-x(2)-E-[AG]-[YWG]-[QRFGSJ-[SGl-[STAN]-G-x-[SAF] [E is an active site residue)- 

50 [ 1] Beguin P. Annu. Rev. Microbiol. 44:219-248(1990). 

( 2] Gilkes N.R.. Henrissat B., Kilbum D.G.. Miller R.C. Jr.. Warren RA.J. Microbiol. Rev. 55:303-315(1991). 
[ 3] Henrissat B.. Claeyssens M., Tomme P., Lemesle L., Mornon J.-R Gene 81:83-95(1989). 
[4] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 5] Ko E.P.. Akatsuka H.. Moriyama H.. Shinmyo A.. Hata Y.. Katsube Y.. Urabe I.. Okada H. Biochem. J. 288: 
55 117-121(1992). 

[0794] 264. Glycosyl hydrolase family 14 
[0795] This family are beta amylases. 
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iS^drCases can be. on the basis of sequence si.i.ari«es. c.ass^ed 

Bacillus polyaiyxa. and Caldocellum saccharolyticum. - Two plants (clover) beta^lucosidases ^^^J^^' 
riiWnt betaTiactosI^ (EC 3 2 1.23 ) from the archaebacteria Sulfolobus solfatancus (genes bgaS and lacS) - 
different beta-galactos dases (to^ Lactobacillus casei. Lactococcus lactis. 

^ndfrorEi^^^hrvsanthemi (gene arbB). - Plants myrosinases (EC 3.2.3.1 ) (sinigrinase) (thioglucosidase). - Mam- 
andfrom ^^'^'"'^c'^T^^^"^^^^": p^,> ' ^ 3 2 1 IO8 / EC 3 2.1.62). LPH. an integral membrane glycoprotein, is 

t2L!^rTLats of a domain of about 450 residues which is evolutionary related to the above glycosyl hydrolases. One 
5 tSr^S^ed e^T is centered on a conserved glutamic acid residue which has been shown 

Si in ^e beTa-olucoLase from Agrobacterium, to be directly involved in glycosidic bond cleavage by acting as a 
Li eopWle Thfs rSon waf^ a signature pattern. As a second signature pattern a conserved region was 

seSd found Tr,re N-temninal extremity of these enzymes, this region also contains f^f^'^^^^^^ ^^^'^^^ _ 
[OTO?] Con^nsus pattern: [LIVMFSTC1-[LIVFYS]-[LIV]-[LIVMST]-E-N-G-[L1VMFAR]-[CSAGN1 [E .s the active site 

Noie"Sis pattern will pick up the last two domains of LPH; the first two domains, which are removed from the LPH 
precui^r by pmteolytic pro.issing, have lost the active stte glutamate and 'f^^-/^- ^« '"^J;;^ l^l" 
foTWrConLnsusVem: F-x-[FYWMHGSTA]-x-[GSTA]-x-[GSTAl{2)-F^^ 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[2] Henrissat B. Protein Seq. Data Anal. 4:61-62(1991). 

[3] Gonzalez-Candelas L., Ramon D., Polaina J. ^ene 95:31-38(1990). _^,,_„„. 
4 El HassouniM., Henrissat B..ChippauxM.. Barras F.J. BactenoL 174:765-777(1992). 

[s! Withers S.G.. Warren R.A.J.. Street I.P.. Rupitz K., Kempton J.B.. Aebereold R. J. Am. Chem. Soc. 112. 

5887-5889(1990). 

! cfnn^^ famih, B5a-aalactosidases (EC 3.2.1.23) from bacteria such as Escherichia coli (genes lacZ and ebgA). 

Streotococcus thetmophilus and from the fungi Kluyveromyces lactis. - Beta-glucuronidase {ECZ^^^omEs- 
S?£S?Sh (genruidA) and from mammals. One of the conserved regions in these enzymes is centered on a con- 

SZls paO.™ IDENCH. F^|KR^«^^t^[HRY^|ST/M=VHSAC^[LlVMFSl(3 )-W^GS^ x(2.3H<-E P Is «» acliv. * 
residue]- 

[1] Henrissat B. Biochem. J. 280:309-316(1991). . Mi„„hioi 1-^7^69 380*1991) 

2 Schroeder C.J.. Robert C. Lenzen G.. McKay L.L., Mercenier A. J. Microbiol. 137.369-380(1991). 
[3! G^bler J.C.. Aebersold R.. Withers S.G. J. Biol. Chem. 267:11126-11130(1992). 

ItrUeTshSSl^^^^^^^^^^^^^ 
into a single family: 

. Beta glucosidases (EC 3.2.1.21) from the fungi Aspergillus wentii (A-3). 

gilis laccharomycopsis fibuligera. (BGL1 and BGL2). Schizophyllum commune and T"^.^°f ^'^/.^^^^f <^'!^'^- 
. Beta glucosidases f^m the bacteria Agrobacterium tumefaciens (Cbgl ), ^^l^^'^'^'^'^J'^^^^^'^^^^^ 

Tum thermocellum (bgIB), Escherichia coli (bglX), En^inia chrysanthem. (bgxA) and Ruminococcus albus. Al 

teromonas strain 0-7 beta-hexosaminidase A (EC 3.2.1.52). 
: and H.0959. the corresponding Haemophilus influenzae protein. 
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One of the conserved regions in these enzymes is centered on a conserved aspartic acid residue which has been 
shown [3]. in Aspergillus wentii beta-glucosidase A3, to be implicated in the catalytic mechanism. This region was used 
as a signature pattern. 

[0802] Consensus pattern: [LIVM](2HKR]-x-[EQK]-x(4)-G-[LIVMFTl-(LIVTl-[LIVMF]-[STI-D-x(2)-[SGADNI] [D is the 
5 active site residue] 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Castle L.A.. Smith K.D., Mon-is R.O. J. Bacteriol. 174:1478-1486(1992). 
[ 3] Bause E., Legler G. Biochim. Biophys. Acta 626:459-465(1980). 

10 

[0803] 268. Glycosyt hydrolases family 8 signature 

The microbial degradation of cellulose and xylans requires several types of enzymes such as endoglucanases (EC 
3.2.1.4) . cellobiohydrolases (EC 3.2. 1.91 ) (exoglucanases). or xylanases (EC 3.2.1.8 ) [1,2]. Fungi and bacteria pro- 
duces a spectrum of cellulolytic enzymes (cellulases) and xylanases which, on the basis of sequence similarities, can 

IS be classified into families. One of these families is known as the cellulase family D [3] or as the glycosyl hydrolases 
family 8 [4,E1J. The enzymes which are currently known to belong to this family are listed below. - Acetobacter xylinum 
endonuclease cmcAX. - Bacillus strain KSM-330 acidic endonuclease K (Endo-K). - Cellulomonas josui endoglucanase 
2 (celB). - Cellulomonas uda endoglucanase. - Clostridium cellulolyticum endoglucanases C (celcCC). - Clostridium 
thermocellum endoglucanases A (celA). - Erwinia chrysanthemi minor endoglucanase y (celY). - Bacillus circulans 

20 beta-glucanase (EC 3.2.1.73 ). - Escherichia coli hypothetical protein yhjM. The most conserved region in these en- 
zymes is a stretch of about 20 residues that contains two conserved aspartate. The first asparatate is thought [5] to 
act as the nucleophile in the catalytic mechanism. This region was used as a signature pattern. 
Consensus pattern: A-[ST]-D-[AG]-D-x(2)-[IM]-A-x-[SA]-[LIVM]-[LIVMG]-x-A- x(3)-[FW] [The first D is an active site 
residue]- 

25 

[ 1] Beguin P. Annu. Rev. Microbiol. 44:219-248(1990). 

[2] Gilkes N.R.. Henrissat B.. Kilbum D.G.. Miller R.C. Jr.. Warren R.A.J. Microbiol. Rev 55:303-315(1991). 
[ 3] Henrissat B., Claeyssens M., Tomme P., Lemesle L., Momon J.-P. Gene 81:83-95(1989). 
[4] Henrissat B. Biochem. J. 280:309-316(1991). 
30 [ 5] Alzari P.M., Souchon H., Dominguez R. Structure 4:265-275(1996). 

[0804] 269. Glycosyl hydrolases family 9 active sites signatures 

The microbial degradation of cellulose and xylans requires several types of enzymes such as endoglucanases (EC 
3.2.1.4) , cellobiohydrolases (EC 3.2.1.91) (exoglucanases), or xylanases (EC 3.2. 1.8) [1 ,2]. Fungi and bacteria produce 

35 a spectrum of cellulolytic enzymes (cellulases) and xylanases which, on the basis of sequence similarities, can be 
classified into families. One of these families is known as the cellulase family E [3] or as the glycosyl hydrolases family 
9 [4, El]. The enzymes which are currently known to belong to this family are listed below. - Butyrivibrio fibrisolvens 
cetlodextrinase 1 (ced1). - Cellulomonas fimi endoglucanases B (cenB) and C (cenC). - Clostridium cellulolyticum 
endoglucanase G (celCCG). - Clostridium cellulovorans endoglucanase C (engC). - Clostridium stercoararium endog- 

^ lucanase Z (avicelase I) (celZ). - Clostridium thermocellum endoglucanases D (celD), F (celF) and I (cell). - Fibrobacter 
succinogenes endoglucanase A (endA). - Pseudomonas fiuorescens endoglucanase A (celA). - Streptomyces reticuli 
endoglucanase 1 (cell). - Thermomonospora fusca endoglucanase E-4 (celD). - Dictyostelium dtscoideum spore ger- 
mination specific endoglucanase 270-6. This slime mold enzyme may digest the spore cell wall during germination, to 
release the enclosed amoeba. - Endoglucanases from plants such as Avocado or French bean. In plants this enzyme 

45 may be involved the fruit ripening process. Two of the most conserved regions in these enzymes are centered on 
conserved residues which have been shown [5,6], in the endoglucanase D from Cellulomonas thermocellum, to be 
important for the catalytic activity. The first region contains an active site histidine and the second region contains two 
catalytically important residues: an aspartate and a glutamate. Both regions were used as signature patterns. 
[0805] Consensus pattern: [STV]-x-[LIVMFY]-[STV]-x(2)-G-x-[NKR]-x(4)-[PLIVM]-H-x-R [H is an active site residue]- 

50 Consensus pattern: [FYWl-x-D-x(4)-[FYW]-x(3)-E-x-[STA]-x(3)-N-[STA] [D and E are active site residues]- 

( 1] Beguin P. Annu. Rev. Microbiol. 44:219-248(1990). 

( 2] Gilkes N.R., Henrissat B.. Kilbum D.G., Miller R.C. Jr., Warren R.A.J. Microbiol. Rev. 55:303-315(1991). 
[ 3] Henrissat B.. Claeyssens M., Tomme P., Lemesle L. Momon J.-P. Gene 81:83-95(1989). 
55 [ 4] Henrissat B. Biochem. J. 280:309-316(1991). 

[5] Tomme P.. Chauvaux S., Beguin P.. Millet J., Aubert J.-P.. Claeyssens M. J. Biol. Chem. 266:10313-10318 
(1991). 

[ 6] Tomme P.. van Beeumen J., Claeyssens M. Biochem. J. 285:319-324(1992). 
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r08061 270 Glyceraldehyde 3-phosphate dehydrogenase active site (gpdh) 

f:"°*L,.I , ^J.,HrrLn^>^ rFC 1.2.1.12) (GAPDH) m is a tetramenc NAD-binding enzyme common 

ro^TthThrgTy^o^y^ic^n^d 7u^^^^^^^ i" *e middle of the molecule is involved in fon^ng a 

LatnrphosSycero fhioester fntermediate. The sequence around this cysteine *t!°:^"y ^"Jf ,^ J. 
rdVukaryo^c G^PDHs and is also present, albeit in a variant form, in the otherwise highly divergent a«=haebac ena. 

related to GAPDH [3]. ... .j i 

[0807] Consensus pattern: [ASV]-S-C-lNT|-T-x(2)-[LIM] [C is the active site residue]- 

[ 11 Harris J.I.. Waters M. (In) The Enzymes (3rd edition) 13:1-50(1976). .,-,.QRm 
2 Fabry S., Lang J., Niennann T. Vingron M.. Hensel R. Eur. J. Biochem. 179:405-413(1989). 
[3] Zhao G., Pease A.J., Bharani N.. Winkler M.E. J. Bacleriol. 177:2804-2812(1995). 

Snsmararil^Se^^^^^ 

nZirknown^s acrogr^ potentially encodes seven different forms of granulin (gmA to gmiG) v^hich are probably 

SaseiSy^-^n^^^^^^^^^ 

STJLxxCxxxxxCCxxxxxxxxCCxxxxxxCC^^^ 

involved in a disulfide bond."*": position of the pattern. Granulins are evolutionary related to a PMP-D1, a pept.de 
extracted from thepars intercerebralis of migratory locusts [2]. , ^ . ,fi^«K„„rf=i 

Consensus pattern: C-x-D-x(2)-H-C-C-P-x(4)-C [The four C"s are probably involved in disulfide bonds]- 

[ 1] Bhandari V., Palfree R.G.. Bateman A. Proc. Natl. Acad. Sci. U^S.A. 89-1 71 5-1 7 19(1992). 
[ 2] Nakakura N.. Hietter H.. van Dorsselaer A.. Luu B. Eur. J. Biochem. 204.147-153(1992). 

rnsim 272 fHGVRdRD) Hepatitis C virus RNA dependent RNA polymerase 

SI ! SrN A dependent RNA polymerase fe also known as non-structural protein NS5B. NS5B is a 65 kDa protein 
Sat isembtesorher viral RNA polymerases. HCV replication Is thought to occur in membrane bound replicat on com- 
p'S^s T^ele cTmi^^^^ transcribe the positive strand and the resu«ing minus s Jnd is used as 3 template for the 
synthesis of genomic RNA. There are two viral proteins involved in the reaction. NS3 and NS5B.I1.2] 
[0812] [1] Lohmann V, Korner F, Herian U. Bartenschlager R; 
J Virol 1997;71:8416-8428. [2] Behrens SE, Tomei L. De Francesco R; 
EMBO J 1996;15:12-22. [3] Ishido S. Fujita T. Hotta H; 
Biochem Biophys Res Commun 1998;244:35-40. 
r0813] 273. (HHH) Helix-hairpin-helix motif. 

[0814] [1] Doherty AJ, Serpell LC. Ponting CP; Nucleic Acids Res 1996;24:2488-2497. 

Ctlya'iZ; rsman prSSns of about 12 to 16 Kd has been describedm. This family currently consists of^^ 

Mamin Totein HINT (also known as Protein kinase C inhibitor 1 or '''<,^;^;;^;^^^^^^^^^ 

a specific inhibitor of PKC. It has been shown to bind zinc. - Fission yeast ^.adenosine 5 "^ -PJ-P^*^*'^^^^^^^ 

asvJnmetrical hydrolase (Ap4Aase) (EC 3.6.1.17) [2] (gene aphi), which f ^^^^^^"^ "PPP^:/.^^^^^^^ 

-F^^T^humanproteln^^^^^^^^ 

Z'SSl 'S^zi SS^r^ellilS 'ISchi c«li hypothetical protein ycfF. - Haemophilus influenzae 
"prJica. p^tein H,0961. - S^icobacter pylori hypothetical protein HPCH04. ' "-"^^^^^pli^^^^^^^^^ 
\ca\ orotein M J0866 - Mycobacterium leprae hypothetical protein U296A. - Synechocystis strain PCC 6803 hypothetical 
p?Jte^sTl234 Caenorhab 

r^^spir ifum brasilense. - A hypothetical 13.1 Kd protein in p37 S'region in Mycoplasma hyort,.nis. - A hypotf.eti^l 
127S SeTn in psbAII 5"region in Synechococcus strain PCC 7942.AI1 these proteins contains a region with three 

r n Seraphin B. DNA Seq. 3:177-179(1992). 

r^iHuanaY Garrison PN Barnes L.D. Biochem. J. 312:925-932(1995). 
1 % BaTs UD . SSs^^^^^^^ G"-n<-*' ^- 
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Huebner K. Biochemistry 35:11529-11535(1996). 

[ 4] Brenner C, Garrison P.. Gilmour J.. Peisach D., Ringe D.. Petsko G.A., Lowenstein J.M. Nat. Struct. Biol. 4: 
231-238(1997). 

5 [0817] 275. Myc-type. •helix-loop-helix' dimerization domain signature (HLH) 

A number of eukaryotic proteins, which probably are sequence specific DNA-binding proteins that act as transcription 
factors, share a conserved domain of 40 to 50 amino acid residues. It has been proposed [1] that this domain is formed 
of two amphipathic helices joined by a variable length linker region that could form a loop. This 'helix-loop-helix' (HLH) 
domain mediates protein dimerization and has been found in the proteins listed below [2.3 ,E1 , E2] . Most of these pro- 

10 teins Nave an extra basic region of about 15 amino acid residues that is adjacent to the HLH domain and specifically 
binds to DNA. They are refered as basic helix-loop-helix proteins (bHLH), and are classified in two groups: class A 
(ubiquitous) and class B (tissue-specific). Members of the bHLH family bind variations on the core sequence 'CANNTG* 
also referred to as the E-box motif. The homo- or heterodlmerization mediated by the HLH domain is independent of. 
but necessary for DNA binding, as two basic regions are required for DNA binding activity. The HLH proteins lacking 

15 the basic domain (Emc, Id) function as negative regulators since they form heterodimers, but fail to bind DNA. The 
hairy-related proteins (hairy, E(spl), deadpan) also repress transcription although they can bind DNA. The proteins of 
this subfamily act together with co-repressor proteins, like groucho, through their C-terminal motif WRPW. - The myc 
family of cellular oncogenes [4], which is currently known to contain four members: c-myc [E3] , N-myc, L-myc, and B- 
myc. The myc genes are thought to play a role in cellular differentiation and proliferation. - Proteins involved in myo- 

20 genesis (the induction of muscle cells). In mammals MyoDI (Myf-3), myogenin (Myf-4). Myf-5, and Myf-6 (Mrf4 or 
herculin). in birds CMD1 (QMF-1), in Xenopus MyoD and MF25. in Caenorhabditis elegans CeMyoD. and in Drosophila 
nautilus (nau). - Vertebrate proteins that bind specific DNA sequences ('E boxes') in various immunoglobulin chains 
enhancers: E2A or ITF-1 (E12/pan-2 and E47/pan-1), ITF-2 (tcf4), TFE3, and TFEB. - Vertebrate neurogenic differen- 
tiation factor 1 that acts as differentiation factor during neurogenesis. - Vertebrate MAX protein, a transcription regulator 

25 that forms a sequence- specific DNA-binding protein complex with myc or mad. - Vertebrate Max Interacting Protein 
1 (MXI1 protein) which acts as a transcriptional repressor and may antagonize myc transcriptional activity by competing 
for max. - Proteins of the bHLH/PAS superfamily which are transcriptional activators. In mammals. AH receptor nuclear 
translocator (ARNT), single-minded homologs (SIM1 and SIM2), hypoxia-inducible factor 1 alpha (HIF1 A), AH receptor 
(AHR), neuronal pas domain proteins (NPAS1 and NPAS2), endothelial pas domain protein 1 (EPAS1), mouse ARNT2, 

30 and human BMAL1 . In drosophila. single-minded (SIM), AH receptor nuclear translocator (ARNT), trachealess protein 
(TRH), and similar protein (SIMA). - Mammalian transcription factors HES, which repress transcription by acting on 
two types of DNA sequences, the E box and the N box. - Mammalian MAD protein (max dimerizer) which acts as 
transcriptional repressor and may antagonize myc transcriptional activity by competing for max. - Mammalian Upstream 
Stimulatory Factor 1 and 2 (USF1 and USF2). which bind to a symmetrical DNA sequence that is found in a variety of 

35 viral and cellular promoters. - Human lyl-1 protein; which is involved, by chromosomal translocation, in T- cell leukemia. 
- Human transcription factor AP-4. - Mouse helix-loop-helix proteins MATH-1 and MATH-2 which activate E box-de- 
pendent transcription in collaboration with E47. - Mammalian stem cell protein (SCL) (also known as tall), a protein 
which may play an important role in hemopoietic differentiation. SCL is involved, by chromosomal translocation, in 
stem-cell leukemia. - Mammalian proteins Idl to Id4 [5]. Id (inhibitor of DNA binding) proteins lack a basic DNA-binding 

40 domain but are able to form heterodimers with other HLH proteins, thereby inhibiting binding to DNA. - Drosophila 
extra-macrochaetae (emc) protein, which participates in sensory organ patterning by antagonizing the neurogenic 
activity of the achaete- scute complex. Emc is the homolog of mammalian Id proteins. - Human Sterol Regulatory 
Element Binding Protein 1 (SREBP-1), a transcriptional activator that binds to the sterol regulatory element 1 (SRE- 
1) found in the flanking region of the LDLR gene and in other genes. - Drosophila achaete-scute (AS-C) complex 

45 proteins T3 (I'sc), T4 (scute), T5 (achaete) and TB (asense). The AS-C proteins are involved in the determination of 
the neuronal precursors in the peripheral nervous system and the central nervous system. - Mammalian homologs of 
achaete-scute proteins, the MASH-1 and MASH-2 proteins. - Drosophila atonal protein (ato) which is involved in neu- 
rogenesis. - Drosophila daughteriess (da) protein, which is essential for neurogenesis and sex-determination. - Dro- 
sophila deadpan (dpn). a hairy-like protein involved In the functional differentiation of neurons. - Drosophila delilah 

50 (del) protein, which is plays an Important role in the differentiation of epidenmal cells into muscle. - Drosophila hairy 
(h) protein, a transcriptional repressor which regulates the embryonic segmentation and adult bristle patterning. - Dro- 
sophila enhancer of split proteins E(spl), that are hairy-like proteins active during neurogenesis, also act as transcrip- 
tional repressors. - Drosophila twist (twi) protein, which is involved in the establishment of germ layers in embryos. - 
Maize anthocyanin regulatory proteins R-S and LC. - Yeast centromere-binding protein 1 (CPF1 or CBF1 ). This protein 

55 is involved in chromosomal segregation. It binds to a highly conserved DNA sequence, found in centromers and in 
several promoters. - Yeast 1N02 and 1N04 proteins. - Yeast phosphate system positive regulatory protein PH04 which 
interacts with the upstream activating sequence of several acid phosphatase genes. - Yeast serine-rich protein TYE7 
that is required for ty-mediated ADH2 expression. - Neurospora crassa nuc-1 . a protein that activates the transcription 
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of structural genes for phosphorus acquisition. - Fission yeast protein esc! which is involved in the sexual differentiation 

-.-I __i *: ^f-i-u^ K^iiv ir>r>r\.HaViv rinmpin Ir «hown here: 

xxxxxxxxxxxxxxxxxxxxxxxx- ^xxxxxxxxxxxxxxxxxxxxxxx Amphipathic helix 1 Loop Amphipathic helix 2. 

The signature pattern developed to detect this domain spans completely the second amphipathic helix. 

[0818] Consensus pattern: [DENSTAP]-[KTR]-[LIVMAGSNT]^FYWCPHKRHLIVMT14LIVM1. x(2WSTAVl-[LIVM. 

STACKR]-x-[VMFYHHLIVMTA]-{P^{P}-[LIVMRKHQ].- 

[ 1] Mun-e C, McCaw RS., Baltimore D. Ceil 56:777-783(1989). 
[ 21 Garrel J.. Campuzano S. BioEssays 13:493-498(1991). 
[ 3] Kato G.J., Dang C.V. FASEB J. 6:3065-3072(1992). 

[ 4] Krause M.. Fire A.. Hanison S.W., Priess J.. Weintraub H. Cell 63:907-919(1990). 
[ 5] Riechmann V., van Cruechten I.. Sablitzky F. Nucleic Acids Res. 22:749-755(1994). 

[0819] 276. HMG14 and HMG17 signature 

High mobility group (HMG) proteins are a family of relatively low molecular weight nonhistone components in chromatin 
HMG14 and HMG17 [1], two related proteins of about 100 amino acid residues, bind to the inner side of the nucleosomal 
DNA thus altering the interaction between the DNA and the histone octamer. These two proteins may be involved in 
the process which maintains transcribable genes in a unique chromatin conformation. The trout nonhistone chromo- 
somal protein H6 (histone T) also belongs to this family. As a signature pattem a conserved stretch of 10 residues 
located in the N-terminal section of HMG14 and HMG17 was selected. 
[0820] Consensus pattem: R-R-S-A-R-L-S-A-[RK]-P- 

[0821] [ 1] Bustin M., Reeves R. Prog. Nucleic Acid Res. Mol. Biol. 54:35-100(1996). 
[0822] 277. Hydroxymethylglutaryl-coenzyme A lyase active site (HMGL1) 

3-hydroxy-3-methylglutaryl-coenzyme A lyase (HMG-CoA lyase or HL) (EC 4.1.3.4)catalyzes the transformation of 
HMG-CoA into acetyl-CoA and acetoacetate. In vertebrates it is a mitochondrial enyme which is involved in ketogenesis 
and in leucine catabolism [1]. In some bacteria, such as Pseudomonas mevalonii, it is involved in mevalonate catab- 
olism (gene mvaB). A cysteine has been shown[21. in mvaS, to be required for the activity of the enzyme. The region 
around this residue is perfectly conserved and is used as a signature pattern. 
[0823] Consensus pattern: S-V-A-G-L-G-G-C-P-Y [C is the active site residue]- 

[ 1] Mitchell G.A.. Robert M.-R, Hruz RW., Wang S.. Fontaine G.. Behnke C.E.. Mende-Mueller L.M.. Schappert 
K., Lee C, Gibson K.M.. Miziorko H.M. J. Biol. Chem. 268:4376-4381(1993). 
[ 2] Hruz P.W.. Narasimhan C. Miziorko H.M. Biochemistry 31:6842-6847(1992). 

[08241 Alpha-isopropylmalate and homocitrate synthases signatures (HMGL2) 

The following enzymes have been shown [1] to be functionally as well as evolutionary related: - Alphansopropylmalate 
synthase (EC 4.1.3.12) which catalyzes the first step in the biosynthesis of leucine, the condensation of acetyl-CoA 
and alpha- ketoisovalerate to form 2-isopropylmaiate synthase. - Homocitrate synthase (EC4.1.3.21) (gene nifV) which 
is involved in the biosynthesis of the iron-molybdenum cofactor of nitrogenase and catalyzes the condensation of 
acetyl-CoA and alpha-ketoglutarate into homocitrate. - Soybean late nodulin 56. - Methanococcus jannaschii hypo- 
thetical proteins MJ0503. MJ1195 and MJ1392. Two conserved regions were selected as signature patterns for these 
enzymes. The first region is located in the N-terminal section while the second region is located in the central section 
and contains two conserved histidine residues which could be implicated in the catalytic mechanism. 
[0825] Consensus pattem: L-R-[DE]-G-x-Q-x(10)-K- 
Consensus pattem: [L!VMFW]-x{2)-H-x-H-[DN]-D-x-G-x-[GAS]-x-[GASLI]- 

[0826] [ 1] Wang S.-Z., Dean D.R., Chen J.-S., Johnson J.L. J. Bacteriol. 173:3041-3046(1991). 
[0827] 278 (HMG COA synt) Hydroxymethylglutaryl-coenzyme A synthase active site Hydroxymethylglutaryl-coen- 
zyme A synthase (EC 4.1.3.5 ) (HMG-CoA synthase) catalyzes the condensation of acetyl-CoA with acetoacetyl-CoA 
to produce HMG- CoA and CoA [1].ln vertebrates there are two isozymes located in different subcellular compartments: 
a cytosolic form which is the starting point of the mevalonate pathway which leads to cholesterol and other steroltc and 
isoprenoid compounds and a mitochondrial form responsible for ketone body biosynthesis. HMG-CoA is also found in 
other eukaryotes such as insect, plants and fungi. A cysteine is known to act as the catalytic nucleophile in the first 
step of the reaction, the acetylation of the enzyme by acetyl-CoA. The conserved region was used around this active 
site residue as a signature pattem. 

[0828] Consensus pattem: N-x-[DN]-[IV]-E-G-[IV]-D-x(2)-N-A-C-[FY]-x-G[C is the active site residue]- 

[0829] [ 1] Rokosz L.L.. Boulton D.A., Butkiewicz E.A.. Sanyal G.. Cueto M.A.. Lachance RA.. Hermes J.D. Arch. 

Biochem. Biophys. 312:1-13(1994). 

[0830] 279. HMG (high mobility group) box 
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[0831] 280. HSF-type DNA-binding domain signature 

[0832] Heat shock factor (HSF) is a DNA-binding protein that specifically binds heat shock promoter elements (HSE). 
HSE is a palindromic element rich with repetitive purine and pyrimidine motifs: 5'-nGAAnnTTCnnGAAnnTTCn-3\ HSF 
is expressed at normal temperatures but is activated by heat shock or chemical stressors [1 ,2]. The sequences of HSF 

5 from various species show extensive similarity in a region of about 90 amino acids, which has been shown [3] to bind 
DNA. Some other proteins also contain a HSF domain, these are: - Yeast SFL1, a protein involved in cell surface 
assembly and regulation of the gene related to flocculation (asexual cell aggregation) [4]. - Yeast transcription factor 
SKN7 (or BRY1 or POS9), which binds to the promoter elements SCB and MCB essential for the control of G1 cyclins 
expression (5). - Yeast MGA1. - Yeast hypothetical protein YJR147w. A pattern from the most conserved part of the 

10 HSF DNA-binding domain was derived, its central region. 

[0833] Consensus pattem: L-x(3)-[F Y]-K-H-x-N-x-[STAN]-S-F-[LI VM]-R-Q-L.[N H]-x-Y-x-[FYW]-[RKH]-K-[LI VM]- 

[ 1] Sorger RK. Cell 65:363-366(1991). 

[2] Mager W.H., Moradas Ferreira R Biochem. J. 290:1-13(1993). 
15 I 3] Vuister G.W., Kim S.-J., Orosz A., Marquardt J., Wu C, Bax A. Nat. Struct. Biol. 1 :605-613(1994). 

[ 4] Fujita A.. KikuchI Y, Kuhara S., Misumi Y, Matsumoto S.. Kobayashi H. Gene 85:321-328(1989). 
[ 5] Morgan B.A., Bouquin N., Merrill G.F.. Johnston L.H. EMBO J. 14:5679-5689(1995). 

[0834] 281 . Heat shock hsp20 proteins family profile 

20 Prokaryotic and eukaryotic organisms respond to heat shock or other environmental stress by inducing the synthesis 
of proteins collectively known as heat-shock proteins (hsp) [1]. Amongst them is a family of proteins with an average 
molecular weight of 20 Kd. known as the hsp20 proteins [2 to 5). These seem to act as chaperones that can protect 
other proteins against heat-induced denaturation and aggregation. Hsp20 proteins seem to form large heterooligomeric 
aggregates; their family is currently composed of the following members: - Vertebrate heat shock protein hsp27 (hsp25), 

25 induced by a variety of environmental stresses. - Drosophila heat shock proteins hsp22. hsp23. hsp26. hsp27, hsp67BA 
and BC. - Caenorhabditis elegans hsp16 multigene family. - Fungal HSP26 (budding yeast) and hsp30 (Neurospora 
crassa and Aspergillus Ntdulans). - Plant small hsp's. Plants have four classes of hsp20: classes I and II which are 
cytoplasmic, class 111 which is chloroplastic and class IV which is found in the endomembrane. - Alpha-crystallin A and 
B chains. Alpha-crystallin is an abundant constituent of the eye lens of most vertebrate species. Its main function 

30 appears to be to maintain the correct refractive index of the lens. It is also found in other tissues where it seems to act 
as a chaperone [6]. - Schistosoma mansoni major egg antigen p40. Stnjcturally p40 is built of two tandem hsp20 
domains. - A variety of prokaryotic proteins: ibpA and ibpB from Escherichia coli, hsp18 from Clostridium acetobutyli- 
cum, spore protein SP21 (hspA) from Stigmatella auranttaca, Mycobacterium leprae 18 Kd antigen and Mycobacterium 
tuberculosis 14 Kd antigen. - Methanococcus jannaschii hypothetical protein MJ0285. Structurally, this family is char- 

35 acterized by the presence of a conserved C-terminal domain of about 100 residues. The profile developed to detect 
members of the hsp20 family is based on an alignment of this domain. 
[0835] -Sequences known to belong to this class detected by the profile: ALL. 

[ 1] Lindquist S., Craig E.A. Annu. Rev. Genet. 22:631 -677(1 988). [ 2] de Jong W.W., Leunissen J.A.M., Voorter C.E. 
M. Mol. Biol. Evol. 10:103-126(1993).[3] Caspers G.J., Leunissen J.A.M.. de Jong W.W. J. Mol. Evol. 40:238-248 
40 (1995).( 4] Jaenicke R., Creighton TE. Cun-. Biol. 3:234-235(1993). [ 5] Jakob U., Buchner J. Trends Biochem. Sci. 19: 
205-21 1(1994).( 6] Groenen P.J.T.A.. Merck K.B., de Jong W.W.. Bloemendal H. Eur. J. Biochem. 225:1-9(1994). 
[0836] 282. Heat shock hsp70 proteins family signatures 

[0837] Prokaryotic and eukaryotic organisms respond to heat shock or other environmental stress by the induction 
of the synthesis of proteins collectively known as heat-shock proteins (hsp) [1]. Amongst them is a family of proteins 

<5 with an average molecular weight of 70 Kd, known as the hsp70proteins [2,3,4]. In most species, there are many 
proteins that belong to the hsp70 family. Some of them are expressed under unstressed conditions. Hsp70proteins 
can be found in different cellular compartments (nuclear, cytosolic. mitochondrial, endoplasmic reticulum, etc.). Some 
of the hsp70 family proteinsare listed below: - In Escherichia colt and other bacteria, the main hsp70 protein is known 
as the dnaK protein. A second protein, hscA, has been recently discovered. dnaK is also found in the chloroplast 

50 genome of red algae. - In yeast, at least ten hsp70 proteins are known to exist: SSA1 to SSA4. SSB1. SSB2, SSC1, 
SSD1 (KAR2), SSE1 (MS13) and SSE2. - In Drosophila, there are at least eight different hsp70 proteins: HSP70, 
HSP68. and HSC-1 to HSC-6. - In mammals, there are at least eight different proteins: HSPA1 to HSPA6, HSC70, and 
GRP78 (also known as the immunoglobulin heavy chain binding protein (BiP)). - In the sugar beet yellow virus (SBYV), 
a hsp70 homolog has been shown [5] to exist - In archaebacteria, hsp70 proteins are also present [6]AII proteins 

55 belonging to the hsp70 family bind ATP. A variety of functions has been postulated for hsp70 proteins. It now appears 
[7] that some hsp70proteins play an important role in the transport of proteins across membranes. They also seem to 
be involved in protein folding and in the assembly/disassembly of protein complexes [8]. Three signature patterns for 
the hsp70 family of proteins were derived; the first centered on a conserved pentapeptide found in the N4erminal 
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section of these proteins; the two others on consen/ed regions located In the central part of the sequence. 

r^««««»i ^ ri\/i-n_i _r:_T_rRTi-y-rsci - 

Co7:^nsu7rattrmTlUVMn^^^^^^^ 

Consensus pattern: [LIVMYl-x4LIVMn-x-G-G-x-[S-n-x-[LIVM]-P-x-[LIVMl-x-[DEQKRSTAl- 

[ 1] Lindquist S., Craig E.A. Annu. Rev. Genet. 22:631-677(1988). 

[ 2] Pelham H.R.B. Cell 46:959-961(1986). ^o/.ooox 

31 Pelham H R B. Nature 332:776-77(1 988).( 4] Craig E.A. BioEssays 11:48-52(1989). 

5 A^larl^vsky A.A.. Boyko V.P.. Karasev A.V.. Koonin E.V., Dolja V.V. J. Mol. Biol. 217:603-610(1991). 
[ 6] Gupta R.S.. Singh B. J. Bacteriol. 174:4594-4605(1992). ,„„,,„oo, 
[ 7] Deshaies R.J.. Koch B.D.. Schekmam R. Trends Biochem. Sci. 13:384-388(1988). 
[8] Craig E.A., Gross C.A. Trends Biochem. Sci. 16:135-140(1991). 

[0839] 283. Heat shock hsp90 proteins family signature k.. *h= ir,H„o«nn of the 

Prokaryotic and eukaryotic organisms respond to heat shock or other environmental stress by the '"duchon of fte 
s^ntheL Of proteins 2.llectively known as heat-shock proteins (hsp) [11. Amongst them -^J^^'V ° 
an averaqe molecular weight of 90 Kd, known as the hsp90proteins. Proteins known to belong to this family are. 
IscSra STnd other Lcteria heat shock protein c62.5 (gene htpG). -^f^^^l^ZS2T:S YeTs anS 
90-beta (hSD 84) - Drosophila hsp 82 (hsp 83). - Trypanosoma cruzi hsp 85. - Plants Hsp82 or Hsp83^- Yeast ana 
o?h? ungi HSC82. and HSP82. - The 'endoplasmic reticulum protein -endoplasmin- (also known as Erp99 in mouse 
SRP94i'hamster and hsp108inchK,ken).Theexactfunctionofhsp90 proteins is notyetkno^^^^^^ 
hsp90 has been found associated with steroid homione receptors, with tyrosine kinase 
retroviruses, with elF2alpha kinase, and with actin and tubulin. Hsp90 are probable chaperonins that P0^«^ 
activity [2.31.AS a signature pattern for the hsp90 family of proteins, a highly conserved region found in the N-temi.nal 
part of these proteins was selected. 

[0840] Consensus pattern: Y-x-[NQH]-K-[DEHIVA]-F-L-R-[ED] - 

[ 1] Lindquist S., Craig E.A. Annu. Rev. Genet. 22:631-677(1988). 

[ 2] Nadeau K., Das A.. Walsh C.T. J. Biol. Chem. 268:1479-1487(1993). 

[ 3] Jakob U.. Buchner J. Trends Biochem. Sci. 19:205-211(1994). 

K ™s iX'SrorONTbfnding helix-turn helix p«,teins includes Cro S>^ss:P03036 and C. Swlss:P03034. 
U^^aveTrhemiring-imnip^^ 

converted to bilirubin by biliverdin reductase. In mammals there are three isozymes of heme oxygenase. HO-I to HO 
TZ^^Z isozymes differ in their tissue expression and their inducibility: HO-1 is highly ^-f^-^^^^^^toT^^ 
heme and by various non-heme substances, while HO-2 is non-inducible. It has been suggested [2] that HO-2 could 
STmpliited in the production of carbon monoxide in the brain where it Is said to act as ^ --^J^.^-"^^^^^ 
genome of the chloroplast of red algae as well as in cyanobacteria. there is a heme oxygenase (gene pbsA) that is the 
Sr^e in le sylesis of the chromophoric part of the photosynthetic antennae [3]. An heme o^genase is also 
presen^Tthe bacteria Corynebacterium diphtheriae (gene hmuO), where It is involved in the acqu.sit^n of iron from 
the host heme [4].There is in the central section of these enzymes, a well conserved region centered on a hist.dine 
rSid JwhlJ^. is proposed to play a key role in binding the substrate heme at the active center of the enzyme. This 
reoion was used as a signature pattern. 

[0844] Consensus pattern: L-{IV]-A-H-[STACH]-Y-[STV]-[RT1-Y-[LIVM1-G IN binds the heme] 
[ 1] Maines M.D. FASEB J. 2:2557-2568(1988). 

[2] Barinaga M. Science 259:309-309(1993). ..,..,.„„,v 
[ 3] Richaud C. Zabulon G. Proc. Natl. Acad. Sci. U.SA. 94:11736-11741(1997). 
[4] Schmitt M.R J. Bacteriol. 179:838-845(1997). 

[0845] 286. Hepatitis core antigen. 

0846 The core antigen of hepatitis viruses possesses a carboxyl temiinus rich m arginine. On th^ basis it was 
predicted that the core antigen would bind DNA [1]. There is some experimeritel f ^'^'^'g^..,^^ 
[0847] [11 Pasek M. Goto T. Gilbert W, Zink B, Schaller H, Mckay P, Leadbetter G. Murray K Nature 1979.282. 
575-579. [2] Gallina A, Bonelli F. Zentilin L. RIndl G. Muttini M. MilanesI G; J Virol 1989:63:4645^652. 
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[0848] 287. Histidine biosynthesis protein 

[0849] Proteins involved in steps 4 and 6 of the histidine biosynthesis pathway are contained in this family. Histidine 
is formed by several complex and distinct biochemical reactions catalysed by eight enzymes. The enzymes in this 
Pfam entry are called His6 and His7 in eukaryotes and HisA and HisF in prokaryotes. 
5 [0850] [1] Fani R. Tamburini E, Mori E. Lazcano A, Lio P, Barberio C, Casalone E, Cavalieri D, Perito B. Polsinelli 
M. Gene 1997;197:9-17. [2] Fani R. Lio P. Chiarelli I. Bazzicalupo M. J Mol Evol 1994;38:489-495. 
[0851] 288. Histone deacetylase family 

[0852] Histones can be reversibly acetylated on several lysine residues. Regulation of transcription is caused in part 
by this mechanism. Histone deacetylases catalyse the removal of the acetyl group. Histone deacetylases are related 
10 to other proteins [1]. 

[0853] Leipe DD, Landsman D. Nucleic Acids Res 1997;25:3693-3697. 
[0854] 289. Histidinol dehydrogenase signature 

Histidinol dehydrogenase (EC 1.1.1.23 ) (HDH) catalyzes the terminal step in the biosynthesis of histidine in bacteria, 
fungi, and plants, the four-electron oxidation of L-histidinol to histidine. In bacteria HDH is a single chain polypeptide; 

15 in fungi it is the C-terminal domain of a multifunctional enzyme which catalyzes three different steps of histidine bio- 
synthesis; and in plants it is expressed as nuclear encoded protein precursor which is exported to the chloroplast [1]. 
As a signature pattern a highly conserved region located in the central part of HDH was selected. This region does not 
correspond to the part of the enzyme that, in most, but not all HDH sequences contains a cysteine residue which, in 
Salmonella typhimurlum, has been said [2] to be important for the catalytic activity of the enzyme. 

20 [0855] Consensus pattem: l-D-x(2)-A-G-P-IST]-E-[LIVS]-[LIVMA](3)-[AC]-x(3)-A-x(4)-[LIVM]-[AV]-[SACL]-[DE]- 
[LIVMFC]-[LIVM]-[SA]-x(2)-E-H- 

[ 1] Nagai A., Ward E., Beck J., Tada S., Chang J.-Y., Scheidegger A., Ryals J. Proc. Natl. Acad. Sci. U.S.A. 88: 
4133-4137(1991). 

25 [ 2] Grubmeyer C.T. Gray W.R. Biochemistry 25:4778-4784(1986). 

[0856] 290. Homoserine dehydrogenase signature 

Homoserine dehydrogenase (EC 1.1.1.3 ) (HDh) [1,2] catalyzes NAD-dependent reduction of aspartate beta-semial- 
dehyde into homoserine. This reaction is the third step in a pathway leading from aspartate to homoserine. The latter 

30 participates in the biosynthesis of threonine and then isoleucine as well as in that of methionine. HDh is found either 
as a single chain protein as in some bacteria and yeast, or as a bifunctional enzyme consisting of an N-terminal as- 
partokinase domain and a C-termtnal HDh domain as in bacteria such as Escherichia coli and in plants. As a signature 
pattern, the best conserved region of Hdh has been selected. This is a segment of 23 to 24 residues located in the 
central section and that contains two conserved aspartate residues. 

35 [0857] Consensus pattem: A-x(3)-G-[LIVMFY]-ISTAGl-x{2,3)-[DNS]-P-x(2)-D-[LIVM]-x-G- x-D-x(3)-K- 

[ 1] Thomas D., Barbey R.. Surdin-Kerjan Y. FEBS Lett. 323:289-293(1993). 
[ 2] Cami B.. Clepet C, Patte J.-C. Biochimie 75:487-495(1993). 

^0 [0858] 291. haloacid dehalogenase-like hydrolase 

[0859] This family is structurally different from the alpha/ beta hydrolase family (a b hydrolase). This family includes 
L-2-haloacid dehalogenase, epoxide hydrolases and phosphatases. The structure of the family consists of two do- 
mains. One is an inserted four helix bundle, which is the least well conserved region of the alignment, between residues 
16 and 96 of Swiss: P24069. The rest of the fold is composed of the core alpha/beta domain. [1] Hisano T, Hata Y, Fujii 

45 T, Liu JQ, Kurihara T, Esaki N, Soda K, J Biol Chem 1996; 271 :20322-20330. 

[0860] 292. DEAD and DEAH box families ATP-dependent helicases signatures (helicase_C) 
A number of eukaryotic and prokaryotic proteins have been characterized [1,2,3] on the basis of their structural simi- 
larity. They all seem to be involved in ATP-dependent, nucleic-acid unwinding. Proteins currently known to belong to 
this family are: - Initiation factor elF-4A. Found in eukaryotes, this protein is a subunit of a high molecular weight 

50 complex involved in 5'cap recognition and the binding of mRNA to ribosomes. It is an ATP-dependent RNA-helicase. 

- PRP5 and PRP28. These yeast proteins are involved in various ATP-requiring steps of the pre-mRNA splicing process. 

- P1 10, a mouse protein expressed specifically during spermatogenesis. - An3, a Xenopus putative RNA helicase, 
closely related to PI10. - SPP81/DED1 and DBP1, two yeast proteins probably involved in pre-mRNA splicing and 
related to PUD. - Caenorhabditis elegans helicase glh-1. - MSS116, a yeast protein required for mitochondrial splicing. 

55 . SPB4, a yeast protein involved in the maturation of 25S ribosomal RNA. - p68, a human nuclear antigen. p68 has 
ATPase and DNA-helicase activities in vitro. It is involved in cell growth and division. • Rm62 (p62), a Drosophila 
putative RNA helicase related to p68. - DBP2. a yeast protein related to p68. - DHH1 , a yeast protein. - DRS1. a yeast 
protein involved in ribosome assembly. - MAK5, a yeast protein involved in maintenance of dsRNA killer plasmid. - 
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putative RNA helicase. Signature patterns were developed for both subfami 'es. 
[0861] Consensus pattern: [LIVMF](2)-D-E-A-D-[RKEN]-x-[LlVMFYGSTN!- 

entry <PDOC00017 

r n Schmid S R Under P. Mol. Microbiol. 6:283-292(1992). c»o7 -loi too 

1 21 LlndrpjL^ip.. AshburnerM.. Leroy p.. Niete^^ 

(1989). 

f 31 Wassarman D.A.. Steitz J.A. Nature 349:463-464(1991). 

I4I S^n T.C. Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata), 
r 5] Harosh I., Deschavanne P. Nucleic Acids Res. 19:6331-6331(1991). 
1 6] Koonin E.V.. Senkevich T.G. J. Gen. Virol. 73:989-993(1992). 

roftfi2i 293 Heme-binding domain in cytochrome b5 and oxidoreductases(heme_1) ^uron^ 
mSS S^ochrr^eTs s^^^ membran^ound hemo protein which acts as an electron carrier for several niembrane- 

of a number of oxidoreductases consists of the juxtaposition of a heme^.nd.ng domain homologous to that of b5 and 
either a flavodehydrogenase or a molybdopterin domain. These enzymes are: 

. Lactate dehydrogenase (EC 1123) [2]. an enzyme that consists of a flavodehydrogenase domain and a heme- 

M?' '.""rinT^EC ^JTlTatey enzyme involved in the first step of nitrate assimilation in plants, fungi and 
■ rcS^aTsTcoSJI^^^ domain (see <PDgC00484». a heme-binding domain called cyU. 

. rreoSeTc^ss^^ 

taining amino acids. AiS^nsists of a molybdopterin domain and a heme-b.nding domain. 
This family of proteins also includes: 

- TU-36B, a Drosophila muscle protein of unknown function [6]. 

- Fission yeast hypothetical protein SpAC1F12.10c. 
Yeast hypothetical protein YMR073C. 

- Yeast hypothetical protein YMR272c. 

[0864] A segment was used which includes the first of the two histidine heme ligands. as a signature pattern for the 
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heme-binding domain of cytochrome b5 family. 

[0865] Consensus pattern: [FYl-[LIVMK]-x(2)-H-P-[GAl-G [H is a heme axial ligand]- 

[1] Ozols J. Brochim. Biophys. Acta 997:121-130(1989). 
5 [2] Guiard B. EMBO J. 4:3265-3272(1985). 

(3J Calza R., Huttner E., Vincentz M., Rouze P., Galangau R, Vaucheret H., Cherel I., Meyer C, Kronenberger J.. 
Caboche M. Mol. Gen, Genet. 209:552-562(1987). 

[4] Crawford N.M., Smith M.. Bellissimo D.. Davis R.W. Proc. Natl. Acad. Sci. U.S.A. 85:5006-5010(1988), 
[5] Guiard B.. Lederer F. Eur. J. Biochem. 100:441-453(1979). 
10 [6] Levin R.J., Boychuk P.L., Croniger CM., Kazzaz J.A.. Rozek C.E. Nucleic Acids Res. 17:6349-6367(1989). 

[0866] 294. Hexapeptide-repeat containing-transferases signature 

On the basis of sequence similarity, a number of transferases have been proposed [1 ,2,3,4] to belong to a single family. 
These proteins are: - Serine acetyltransferase (EC 2.3.1.30 ) (SAT) (gene cysE), an enzyme involved in cysteine bio- 

15 synthesis. - Azotobacter chroococcum nitrogen fixation protein nifp. NifP is most probably a SAT involved in the opti- 
mization of nitrogenase activity. - Escherichia coli thiogalactoside acetyltransferase (EC 2.3.1.18 ) (gene lacA), an en- 
zyme involved in the biosynthesis of lactose. - UDP-N-acetylglucosamine acy transferase (EC 2.3.1.129 ) (gene IpxA). 
an enzyme involved in the biosynthesis of lipid A. a phosphorylated glycolipid that anchors the I ipopoly saccharide to 
the outer membrane of the cell. - UDP-3-0-[3-hydroxymyristoyl] glucosamine N-acyltransferase (EC 2.3.1.-) (gene 

20 jpxD or firA), which is also involved in the biosynthesis of lipid A. - Chloramphenicol acetyltransferase (CAT) (EC 
2.3.1.28 ) from Agrobacterium tumefaciens. Bacillus sphaericus. Escherichia coli plasmid IncFII NR79, Pseudomonas 
aeruginosa, Staphylococcus aureus plasmid plP630. These CAT are not evolutionary related to the main family of CAT 
(see < PDOC00093 >). - Rhizobium nodulation protein nodL. NodL is an acetyltransferase involved in the O-acetylation 
of Nod factors. - Bacterial maltose O-acety transferase (EC 2.3.1.79 ). - Bacterial tetrahydrodipicolinate N-succinyl- 

25 transferase (EC 2.3.1.117) (gene dapD) which catalyzes the fourth step in the biosynthesis of diaminopimelate and 
lysine from aspartate semialdehyde. - Bacterial N-acetylglucosamine-1 -phosphate uridyltransferase (EC 2.7.7.23 ) 
(gene gimtl or gcaD or tms), an enzyme involved in peptidoglycan and lipopolysaccharide biosynthesis. - Staphyloco- 
ccus aureus protein capG which is involved in biosynthesis of type 1 capsular polysaccharide. - Yeast hypothetical 
protein YJL218w, which is highly similar to Escherichia coli lacA. - Fission yeast hypothetical protein SpAC18B11 .09c. 

30 - Methanococcus jannaschii hypothetical protein MJ1064.These proteins have been shown [3,4] to contain a repeat 
structure composed of tandem repeats of a [LIV]-G-x(4) hexapeptide which, in the tertiary structure of IpxA [5], has 
been shown to form a left-handed parallel beta helix. Our signature pattern is based on a fourfold repeat of this hexa- 
peptide. 

[0867] Consensus pattem: [LIV]-[GAED]-x(2)-[STAV]-x-[LIV]-x(3)-[LIVAC]-x-[UV]- [GAED]-x(2)-[STAVR]-x-[LIV]- 
35 [GAED]-x(2)-[STAV]-x-[LTVl- x(3)-[LIV]- 

1 1] Downie J.A. Mol. Microbiol. 3:1649-1651(1989). 
I 2] Parent R.. Roy P.H. J. Bacterid. 174:2891-2897(1992). 
( 3] Vaara M. FEMS Microbiol. Lett. 97:249-254(1992). 
40 [ 4] Vuorio R., Haerkonen T, Tolvanen M.. Vaara M. FEBS Lett. 337:289-292(1994). 

[ 5] Raetz C.R.H.. Roderick S.L. Science 270:997-1000(1995). 

[0868] 295. Hexokinases signature. Hexokinase (EC 2.7.1.1 ) [1 ,2) is an important glycolytic enzyme that catalyzes 
the phosphorylation of keto- and aldohexoses (e.g. glucose, mannose and fructose) using MgATP as the phosphory! 

45 donor. In vertebrates there are four major isoenzymes, commonly referred as types 1,11. Ill and IV. Type IV hexokinase, 
which is often incorrectly designated glucokinase [3], is only expressed in liver and pancreatic beta-cells and plays an 
important role in modulating insulin secretion; it is a protein of a molecular mass of about 50 Kd. Hexokinases of types 
i to III, which have low Km values for glucose, have a molecular mass of about 100 Kd. Structurally they consist of a 
very small N-terminal hydrophobic membrane-binding domain followed by two highly similar domains of 450 residues. 

50 The first domain has lost Its catalytic activity and has evolved into a regulatory domain. In yeast there are three different 
isozymes: hexokinase PI (gene HXK1), Pll(gene HXKB), and glucokinase (gene GLK1), All three proteins have a 
molecular mass of about 50 Kd. All these enzymes contain one (or two in the case of types I to 111 isozymes)strongly 
conserved region which has been shown [4] to be involved in substrate binding. A pattem from that region has been 
derived 

55 [0869] Consensus pattem: ILIVM]-G-F.[TN]-F-S-[FYl-P-x(5)-[LIVM]-[DNST|-x(3)-[LIVM]-x(2)-W-T-K-x-[LF]- 

[0870] [ 1] Middleton R.J. Biochem. Soc. Trans. 18:1 80-1 83(1 990). [ 2] Griffin L.D.. Gelb B.D.. Wheeler D.A.. Davison 
D.. Adams v., McCabe E.R. Genomics 11:1014-1024(1991).[ 3] Comish-Bowden A., Luz Cardenas M. Trends Biochem. 
Sci. 16:281-282(1991).[4] Schirch D.M.. Wilson J.E. Arch. Biochem. Biophys. 254:385-396(1987). 
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synthesized throughout the cell cycle. 
[0872] Consensus pattern: [AC]-G-L-x-F-P-V- 

[ 1] Wells D E Brown D. Nucleic Acids Res. 19:2173-2188(1991). 
1 2] Thatcher T.H.. Gorovsky MA. Nucleic Acids Res. 22:174-179(1994). 

K Hire m foTeT«!e Sr histones. along wKh H2A, H2B and H3. which fon.s the euKaryotic nucieoso^e 
Srl llonTwith H3 it plays a central role in nucleosome formation. The sequence of histone H4 has 
"varlaSmie then 2 blLn years of evolution [1 .E1J. The region used as a f-^^-^^^^^^^J^^^^^^S^ 
iJ positions 1 4 to 18 of all H4sequences. It contains a lysine residue which is often acetylated [2] and a histid.ne residue 
which is implicated in DNA-binding [3]. 
[0875] Consensus pattern: G-A-K-R-H- 

[ 1] Thatcher TH., Gorovsky M.A. Nucleic Acids Res. 22:174-179(1994). 
[ 21 Doenecke D., Gallwitz D. Mol. Cell. Biochem. 44:113-128(1982). 
[3] Ebralidse K.K., Grachev S.A.. Mirzabekov A.D. Nature 331:365-367(1988). 

Se HTl^one^o'f Sr^hisSs. along with H2A. H2B and H4. which fom.s the eukaryotic nuc.eosonr,e core. It 
Tsff^gh^ ri^ed^^^^^^ amino acid residues [1,2.E1J.The following proteins have been ^^-d to contam 

Z one coSon^^^ a perfectly conserved heptapeptide in the N-terminal part of H3. The second one .s denved 
from a conserved region in the central section of H3. 
[0877] Consensus pattern: K-A-P-R-K-Q-L- 
Consensus pattern: P-F-x-[RA]-L-[VA]-IKRQHDEG]-[IV]- 

35 ( 1] Wells D.E., Brown D. Nucleic Acids Res. 19:2173-2188(1991). 

t 21 Thatcher TH.. Gorovsky M.A. Nucleic Acids Res. 22:174-179(1994). 

3 Sullivan K.F.. Hechenberger M., Masri K. J. Cell Biol. 127:581-592(1994). 
[ 4] Stoler S.. Keith K.C., Curnick K.E., Fitzgerald-Hayes M. Genes Dev. 9.573-586(1995). 

K r::Siforonr:histones.alongwithH2A,H3and^ 

core, using alignments of histone H2Bsequences I1.2.E1J. a conserved region was selected .n the C-term.nal part 
S Consensus patterr,: [KR]-E-[LIVM]-IEQl-T-x(2)-[KRl-x-ILIVMK2)-x-pAG]-[DEJ-L- x-[KR]-H-A-[LIVM]-[STA]- 
45 E-G- 

[ 1] Wells D.E., Brown D. Nucleic Acids Res. 19:2173-2188(1991). 

[ 2] Thatcher T.H.. Gorovsky M.A. Nucleic Acids Res. 22:174-179(1994). 

50 r08811 297 •Homeobox' domain signature and profile (home 1) K«rr.**ntin 
Sfe ■hom^bo? is a protein domain of 60 amino acids [1 to 5.E1J first identified in a number of Drosoph la homeotc 
^ segm^ntefen Joteins. It has since been found to be extremely well conserved in many other an.mals .ncluding 
vSebXt ™^ domain binds DNA through a helix-tum-helix type of structure. Some of the proteins wh.ch contaui a 
homeo'ox domair^^^^^ an important role fn development. Most of these proteins are known to be ^-'l^^n-e ^P^^J^ 

55 DNTbinlg trans(^ipLn factors. The homeobox domain has also been found to be very similar to a reg"on of the 
yeait r^ating t^pe proteins. These are sequence-specific DNA-binding proteins that act as master ^wrtdies 'n yeas^ 
SlenSK co'ntrolling gene expression in a cell *yP-P-'««=,^^^'-J^ ^^^^^^ 

obex domain is shown below. The helix-tum-helix region is shown by the symbols H (for helix), and t (for turn,. 
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xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxHHHHHHHHtttPiHHHHHHHHxxxxxxxxxx 1 1 1 1 1 1 



1 10 20 30 40 50 60 The pattern to detect homeobox sequences that was developed is 24 residues long and spans 
5 positions 34 to 57 of the homeobox domain. 

[0882] Consensus pattem: [LIVMFYG]-[ASLVR]-x(2)-[LIVMSTACN]-x-[LIVIVI]-x(4)-[LIV]-IRKNQESTAIY]-[LIVF- 
STNKH]-W-(FYVC]-x-[NDQTAH]-x(5)- [RKNAIMW] - 

[ 1] Gehring W.J. (In) Guidebook to the homebox genes. Duboule D., Ed., pp1-10, Oxford University Press, Oxford, 
10 (1994). 

[ 2] Buerglin T.R. (In) Guidebook to the homebox genes, Duboule D., Ed., pp25-72, Oxford University Press. Oxford. 
(1994). 

[ 3) Gehring W.J. Trends Biochem. Sol. 17:277-280(1992). 
[4] Gehring W.J., Hiromi Y. Annu. Rev. Genet. 20:147-173(1986). 
15 1 5] Schofield P.N. Trends Neurosci. 10:3-6(1987). 



[0883] 'Homeobox' antennapedia-type protein signature (home2) 

The homeotic Hox proteins are sequence-specific transcription factors. They are part of a developmental regulatory 
system that provides cells with specific positional identities on the anterior-posterior (A-P) axis [1]. The hox proteins 

20 contain a 'homeobox* domain. In Drosophila and other insects, there are eight different Hox genes that are encoded 
in two gene complexes, ANT-C and BX-C. In vertebrates there are 38 genes organized in four complexes. In six of the 
eight Drosophila Hox genes the homeobox domain is highly similar and a conserved hexapeptide is found five to sixteen 
amino acids upstream of the homeobox domain. The six Drosophila proteins that belong to this group are antennapedia 
(Antp), abdominal-A (abd-A), deformed (Dfd). proboscipedia (pb),sex combs reduced (scr) and ultrabithorax (ubx) and 

25 are collectively known as the 'antennapedia* subfamily. In vertebrates the corresponding Hox genes are known [2] as 
HOX-A2. A3. A4,A5. A6, A7, Hox-BI. B2, B3. B4. B5, B6. B7, B8. Hox-C4. C5. C6. C8. Hox-D1.D3. D4 and 
D8.Caenorhabditis elegans lin-39 and mat>-5 are also members of the 'antennapedia* subfamily. As a signature pattern 
for this subfamily of homeobox proteins, the conserved hexapeptide was used. 
[0884] Consensus pattem: ILIVMFE]-[FY]-P-W.M-[KRQTA]- 

30 

1 1) McGinnis W.. Krumlauf R. Cell 68:283-302(1992). 
[ 2] Scott M.R Cell 71:551-553(1992). 



[0885] 'Homeobox* engrailed-type protein signature (home3) 

35 [0886] Most proteins which contain a 'homeobox'domain can be classified [1 ,2], on the basis of their sequence char- 
acteristics, in three subfamilies: engrailed, antennapedia and paired. Proteins currently known to belong to the engrailed 
subfamily are: - Drosophila segmentation polarity protein engrailed (en) which specifies the body segmentation pattem 
and is required for the development of the central nervous system. - Drosophila invected protein (inv), - Silk moth 
proteins engrailed and invected, which may be involved in the compartmentalization of the silk gland. - Honeybee E30 

40 and E60. - Grasshopper (Schistocerca americana) G-En. - Mammalian and birds En-1 and En-2. - Zebrafish Eng-1.- 
2 and -3. - Sea urchin (Tripneusteas gratilla) SU-HB-en. - Leech (Helobdella triserialis) Ht-En. - Caenorhabditis elegans 
ceh- 16. Engrailed homeobox proteins are characterized by the presence of a conserved region of some 20 amino-acid 
residues located at the C-terminat of the 'homeobox' domain. As a signature pattem for this subfamily of proteins, a 
stretch of eight perfectly conserved residues in this region was used. 

45 [0887] Consensus pattem: L-M-A-(EQ]-G-L-Y-N- 

[ 1] Scott M.R. Tamkun J.W., Hartzell G.W. Ill Biochim. Biophys. Acta 989:25-48(1989). 
[ 2] Gehring W.J. Science 236:1245-1252(1987). 



50 [0888] 298. Isocitrate lyase signature (ICL) 

Isocitrate lyase (EC 4.1.3.1 ) [1 ,2] is an enzyme that catalyzes the conversion of isocitrate to succinate and glyoxylate. 

This is the first step in the glyoxylate bypass, an alternative to the tricarboxylic acid cycle in bacteria, fungi and plants. 

A cysteine, a histidine and a glutamate or aspartate have been found to be important for the enzyme's catalytic activity. 

Only one cysteine residue is conserved between the sequences of the fungal, plant and bacterial enzymes; it is located 
55 in the middle of a conserved hexapeptide that can be used as a signature pattern for this type of enzyme. 

[0889] Consensus pattem: K-[KR]-C-G-H-[LMQ] [C is a putative active site residue]- 



[ 1] Beeching J.R. Protein Seq. Data Anal. 2:463-466(1989). 
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[ 2] Atomi H.. Ueda M.. Hikida M.. Hishida T. Teranishi Y. Tanaka A. J. Biochem. 107:262-266(1990). 

K S tcfur inSon Ur 2B alpha, beta and de«a subunits froo, eukaryotes. related proteins fron, 
Seba^ria and IF.2 from prokaryotes. Initiation factor 2 binds to Met-tRNA OTP and the small nbosomal subunrt. 
[0892] [1] Kyrpides NC. Woese CR, Proc Natl Acad Sol U S A 1 998:95:3726-3730. 

Ein Z^r'^^tf^uZ'SZs one of the three factors required for the Initiation of protein biosyn«,esis in 
LaSria STsttiought ofe 

o??hr30S nbosoma?subunlt. the initiator tRNA and the messenger RNA. IF-3 binds to the 30S "^oso-a' -^^^^^^^ 
Sa basic protein of 141 to 212 residues. The chloroplast initiation factor IF-3(chl) .s a protein that enhances the poly 
A uTde'Jerem binding of the Initiator tRNA to chloroplast ribosomal30s subunit. In its -t-e Jonn ,t .s a p..te.n 
Lf about 400 residues whose central section Is evolutionary related to the sequence of 'l^l^^™' ^"^'"'^ 
pattern a highly conserved region was selected located In the central section df '>?1«;«' '^^^f 
[Omi Consensus pattern: [KRHLIVMl(2HDNl-[FY]-IGSNl-[KRHUVMFYSl-x-[FY]-[DEQTH]-x{2)-[KRQh 

r 1] LIverls D Schwartz J.J.. Geertman R., Schwartz 1. FEMS Microbiol. Lett. 112:211-216(1993). 
[ 2] Lin Q.. Ma L.. Burkhart W.. Spremulli L.L. J. Biol. Chem. 269:9436-9444(1994). 

r08951 301 Imidazoleglycerol-phosphate dehydratase Signatures (IGPD) 

lazolSycerol-phosphate dehydratase (EC 4^ is the enzyme that catelyzes the seventh step in the b^syn- 
thesis of histidlne in bacteria, fungi and plants. In most organisms It Is a monofunctK,nal Protemj.f at^^^^ f hSin^t 
n some bacteria such as Escherichia coll it is the C-terminal domain of a bifunctional protein that include a hjidmol- 
phSJLatese domain [1]. TWO Signature patterns were developed that each incl^^^^^^^ 
[0896] Consensus pattern: [LlVMY]-[DEl-x-H-H-x(2)-E-x(2)-[GCAHLIVM]-[STAC]-tLlVM]- 
Consensus pattern: G-x-PN]-x-H-H-x(2)-E-[STAGCl-x-[FY]-K- , m„, ri„, oo'^s-.^aB 606n988) 

[08971 [ 1] Cartomagno M.S.. ChlarlottI L.. Allfano P. Nappo A.G., Brum C.B. J. Mol. B,ol. 203.585-606(1 988). 
108981 302 lndole-3-glycerol phosphate synthase signature (IGPS) 

SdoL^lyce o, Phosphate sylase (EC4Xi48) (IGPS) catalyzes the fourth P ^^^^'''^^^^J^^^^^^^^^ 
he rino dosure of 1-(2-carboxy-phenylamino)-1-deoxyrlbulose into indol-3-glycerol-phosphate.ln some bacteria. IGPS 
s a sSe chain enzyr^^n others - such as Escherichia coll - it is the N-terminal domain of a bIfuncUonal enzyme 
Lt Z L2 ly^^ei N (?-phosphoribosyl)anthranilate isomerase (PRAI) activity, the third step of tryptophan biosynth^ 
r nTiS is the'central domain of a trifunctlonal enzyme that also contains a PRAI C4erminal domain and a 
glutam nS^m^^^^^ N-termlnal domain. The N-terminal section of IGPS contains a highly ^^^^^ 

S X-ray crystallography studies [1] have shown to be part of the active stte cavrty. This region was used as a 

[r9rc""^su:pT«e^ 

[0900] 1 1] Wilmanns M.. Priestle J.P, Niermann T. Jansonius J.N. J. Mol. Biol. 223.477-507(1992). 

3^- ^^Lr Stihydrlra^d'Jnd 6-phosphogluconate dehydratases. Two dehydra^ses have been 
S [1] to be evolutionary reLed - DIhydroxy-acid dehydratase (EC 4^) (gene IlvD or ILV3) wh.«J catal^^es 
tliTurth Step in the biosynThesis of isoleucine and valine, the dehydratatlon of 2,3-dlhydroxyHsovale.c acid into alph^ 
kTtoTJivateric ac d - 6-phosphogluconate dehydratase (EC 4.2.1.12) (gene edd) which catalyzes the first step in the 
SS^ Doudo^^^^^^ L dehydratatlon of 6-phospho-D^luconate into 6-Phospho-2^«hydro.3-deoxy-D-^luc^ 
nate Scherichia coll hypothetical protein yjhG. Both enzymes are proteins of about 600 ammo acid residues Two 
hShly c^nSved regions have been Seveloped as signature patterns. The first pattern Is located in the N«' Pa^ 
and iSains a cysteine that could be Involved in the binding of a 2Fe-2S iron-sulfur cluster [2]. The second pattem is 

located In the C-terminal half. „^ ■■ -n 

r0903] Consensus pattem: C-D-K-x(2)-P-[GA]-x(3)-[GAl [The C could be a 2Fe-2S ligand] 

Consensus pattem: [SA]-L-[LlVM]-T-D-[GA]-R-[LIVMFl-S-[GAHGAV]-[S-n- .7...R-,8^fi46ri992> 
roaiun r 11 Eaan S E Fliege R.. Tong S.. Shibata A., Wolf R.E. Jr.. Conway T J. Bactenol. 174.4638-4646(1992). 
["] vilaJcl^rr Cansado^^^^^^ ^C, Kawakami T. Laborda J.. Notario V. Gene 137:179-185(1993). 
r09051 305 IMP dehydrogenase / GMP reductase signature 

MP dehydrogenase (EC l'l.1.205) (IMPDH) catalyzes the rate-limiting reaction of de novo GTP b-sy«^^ 
NAoiependL reductlon-^n^^o XMP I1].lnh^ 

svnthesis As IMP dehydrogenase is associated with cell proliferation, it is a possible target for cancer chemotherapy. 
Ma^^^aHafand baS^^ are tetramers of identical chains. There are two IMP dehydrogenase isozyrines in 

^uI^SIg^ idu^se (EC 1.6.6.8) catalyzes the irreversible and NADPH^ependent reductwe deamination of 
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GMP Into IMP [3]. It converts nucleobase. nucleoside and nucleotide derivatives of G to A nucleotides, and maintains 
Intracellular balance of A and G nucleotides. IMP dehydrogenase and GMP reductase share many regions of sequence 
similarity. One of these regions is centered on a cysteine residue thought [3] to be involved in binding IMP. This region 
was used as a signature pattern. 
5 [0906] Consensus pattern: [LIVM]-[RK]-[LIVM]-G.[LlVM]-G-x-G-S-[LIVMl-C«x-T [C Is the putative IMP-blnding resi- 
due]- 

[ 11 Collart F.R., Huberman E. J. Biol. Chem. 263:15769-15772(1988). 

[ 2] Natsumeda Y.. Ohno S.. Kawasaki H.. Konno Y.. Weber G.. Suzuki K. J. Biol. Chem. 265:5292*5295(1990). 
10 1 3] Andrews S.C.. Guest J.R. Biochem. J. 255:35-43(1988). 

[0907] 306. (IPPc) Inositol polyphosphate phosphatase family, catalytic domain 
[0908] [1] York JD, Ponder JW, Chen ZW. Mathews FS, Majerus PW; 

Biochemistry 1994;33:13164-13171. [2] Jefferson AB, Auethavekiat V, Pot DA. Williams LT. Majerus PW; J Biol Chem 
15 1997;272:5983-5988. [3] Zhang X, Jefferson AB. Auethavekiat V, Majerus PW; Proc Natl Acad Sci U S A 1995;92: 
4853-4856. [4] York JD. Majerus PW. Proc Natl Acad Scl U S A 1990;87:9548-9552. [5] Neuwald AF. York JD, Majerus 
PW; 

FEBS Lett 1991;294:16-18. 

[0909] 307. IQ calmodulin-binding motif 

20 

[1] Xie X, Harrison DH, Schllchting I, Sweet RM, Kalabokis VN. Szent-Gyorgyi AG. Cohen C; Nature 1994;368: 
306-312. 

[2] Rhoads AR, Friedberg F; FASEB J 1997;11:331-340. 

25 [0910] 308. Inosine-uridine preferring nucleoside hydrolasefamlly signature (lU nuc hydro) 

Inosine-uridine preferring nucleoside hydrolase (EC 3.2.2.1) (lU-nucleosidehydrolase or lUNH) is an enzyme first Iden- 
tified in protozoan [1] that catalyzes the hydrolysis of all of the commonly occuring purine and pyrimtdine nucleosides 
Into ribose and the associated base, but has a preference for inoslne and uridine as substrates. This enzyme Is Important 
for these parasitic organisms, which are deficient in de novo synthsis of purines, to salvage the host purine nucleosides. 

30 lUNH from Crithidia fasciculata has been sequenced and characterized, it is an homotetrameric enzyme of subunits 
of 34 Kd. An histidine has been shown to be important for the catalytic mechanism, it acts a proton donor to activate 
the hypoxanthlne leaving group. lUNH is evolutionary related to a number of uncharacterized proteins from various 
biological sources, notably: - Escherichia coli hypothetical protein yaaF. - Escherichia coli hypothetical protein ybeK. 
- Escherichia coli hypothetical protein yeiK. - Fission yeast hypothetical protein SpAC17G8.02. - Yeast hypothetical 

35 protein YDR400w. - An hypothetical protein from the archaebacteria Desulfurolobus ambivatens. As a signature pattem 
for these proteins, a highly conserved region was selected located in the N-termina! extremity. This region contains 
four conserved aspartates that have been shown [2] to be located in the active site cavity. 
[0911] Consensus pattem: D-x-D-[PTl-[GA]-x-D-D-[rAV]-[VI].A - 

40 [ 1] Gopaul D.N.. Meyer S.L., Degano M., Sacchettini J.C.. Schramm V.L. Biochemistry 35:5963-5970(1996). 

[2] Degano M.. Gopaul D.N.. Scapin G., Schramm V.L., Sacchettini J.C. Biochemistry 35:5971-5981(1996). 

[0912] 309. (Insulinase) 
Insulinase family, zinc-binding region signature 
45 (aka Peptidase_M16) 

[0913] A number of proteases dependent on divalent cations for their activity have been shown [1.2] to belong to 
one family, on the basis of sequence similarity. These enzymes are listed below. 

Insulinase (EC 3.4.24.56) (also known as insulysin or insulin-degrading enzyme or IDE), a cytoplasmic enzyme 
50 which seems to be involved in the cellular processing of insulin, glucagon and other small polypeptides. 

Escherichia coli protease III (EC 3.4.24.55) (pitrilysin) (gene ptr). a periplasmic enzyme that degrades small pep- 
tides. 

Mitochondrial processing peptidase (EC 3.4.24.64) (MPP). This enzyme removes the transit peptide from the pre- 
cursor form of proteins imported from the cytoplasm across the mitochondrial inner membrane. It is composed of 
55 two nonidentical homologous subunits termed alpha and beta. The beta subunit seems to be catalytically active 

while the alpha subunit has probably lost Its activity. 

Nardilysin (EC 3.4.24.61) (N-arginine dibasic convertase or NRD convertase) this mammalian enzyme cleaves 
peptide substrates on the N-terminus of Arg residues in dibasic stretches. 
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Klebsiella pneumoniae protein pqqF. This protein is required for the biosynthesi^^ r^^r^rpeTt!^^^^^^^^^ 

.: ; . /Don^ It ic thnnnht to be orotease that cleaves peptide bonds in a small peptide (gene pqqM; tnu^ 

providing^th^e glutamate and ti^rosine residues necessary for the synthesis of PQQ. 
Yeast protein AXL1, which is involved in axial budding [3]. 

Eimeria bovis sporozoite developmental protein. -^a..^^,^^ r^rotoin 

Escherichia coll hypothetical protein yddC and HI1368. the correspondir^ Haemophilus .nfluenzae protein. 

Bacillus subtills hypothetical protein ymxG. 

Caenorhabditis elegans hypothetical proteins C28F5.4 and F56D2.1. 

[0914] It Should be noted that in addition to the above enzymes, this family also includes the core P-^^;- ' ^^J^ 
of the mitochondrial bc1 complex (also called cytochrome c reductase or complex III), but the srtuat.on as to the activity 
or lack of activity of these subunits is quite complex: 

. In mammals and yeast, core proteins I and II lack enzymatic activity. 

. in Neurospora crassa and in potato core protein 1 is equivalent to the beta subunrt of MPP. 

. In Euglena gracilis, core protein I seems to be active, while subunit II is inactive. 

mmmmMmm 

inactive members. x-ISTAl-H-rLlVMFY]-[LlVMCl-[DERNHHRKLHLMFAT]-x-[LFSTHl-x- 
[^iluGsTrhe two^H 1 zlncTga ds ?e sTaZ site ^'eJidue] sLUces known to belong to this class 
SS'S^S ^ITplZ^e members'Is well as all MPP alpha subunits and core U subunits. Does not detect 

lW^T] ^Note: these'Sotelns belong to family M16 in the classification of peptidases [5]. 

[ 11 Rawlings N.D., Barrett A.J. Blochem. J. 275:389-391(1991). 

f 21 Braun H -P Schmitz U.K. Trends Biochem. Sci. 20:171-175(1995). 

3 Becker A.B. Roth R.A. Proc. Natl. Acad. Sci. U.S.A. 89:3835-3839(1992)^ 

4 Fuiita A . Oka C, Arikawa Y, Katagai T., Tonouchi A., Kuhara S., Misumi Y. Nature 372.567-570(1994). 
1 5] Rawlings N.D.. Barrett A.J. Meth. Enzymol. 248:183-228(1995). 

Kiel ll';;r>^rrc.sh JF. Munhy S. Rorke EA. Welter JF. J Invest Demiatol 1993=100:613-617. 
[0920] 311. Isochorismatase family. This family are hydrolase enzymes u D..„mann i J Mol Biol 1992- 

lo921] Romao MJ. Turk D. Gomis-Ruth FX, Huber R. Schumacher G. Mollenng H. Russmann L. J Mol Biol 1992, 



226:1111-1130. 



1 - Vertebrate inosito polyphosphate i-pnospnaiase o. ■ .i^ c 

Jfci.r BaSprotein cysQ. CysQ could help to control the pool of PAPS (3'-phosphoadenoside 5 -phosphosul- 
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( 11 Neuwald A,R, York J.D., Majerus P.W. FEBS Lett. 294:16-18(1991). 

( 2] Glaeser H-U., Thomas D.. Gaxiola R.. Montrichard R, Surdin-Kegan Y,. Serrano R. EMBO J. 12:3105-3110 
(1993). 

( 3] Bone R.. Springer J.P.. Atack J.R. Proc. Natl. Acad. Scl. U.S.A. 89:10031-10035(1992). 

5 

[0924] 313. Ion transport protein 

[0925] This family contains Sodium, Potassium, Calcium ion channel This family is 6 transmembrane helices in which 
the last two helices flank a loop which determines ion selectivity. In some sub-families (e.g. Na channels) the domain 
is repeated four times, whereas in others (e.g. K channels) the protein forms as a tetramer in the membrane. A bacterial 
10 structure of the protein is known for the last two helices but is not the Pfam family due to it lacking the first four helices 
[0926] 314. Isocitrate and isopropylmalate dehydrogenases signature (isodh) 

Isocitrate dehydrogenase (IDH) [1 ,2] is an important enzyme of carbohydrate metabolism which catalyzes the oxidative 
decarboxylation of isocitrate into alpha-ketoglutarate. IDH is either dependent on NAD+ (EC 1.1.1.41 ) or on NADP+ 
(EC 1.1.1.42 ). In eukaryotes there are at least three isozymes of IDH: two are located in the mitochondrial matrix (one 

15 NAD+-dependent. the other NADP+-dependent), while the third one (also NADP+-dependent) is cytoplasmic. In Es- 
cherichia coil the activity of a NADP+-dependent form of the enzyme is controlled by the phosphorylation of a serine 
residue; the phosphorylated form of IDH is completely inactivated. 3-isopropylmalate dehydrogenase (EC 1.1.1.85 ) 
(IMDH) [3.4] catalyzes the third step in the biosynthesis of leucine in bacteria and fungi, the oxidative decarboxylation 
of 3-isopropylmalate into 2-oxo-4-methylvalerate. Tartrate dehydrogenase (EC 1.1.1.93) [5] catalyzes the reduction of 

20 tartrate to oxaloglycolate. These enzymes are evolutionary related [1.3.4,5]. The best conserved region of these en- 
zymes is a glycine-rich stretch of residues located in the C-terminal section. This region was used as a signature pattem. 
[0927] Consensus pattern: [NS]-[LIMYT].[FYDN]-G-[DNT|-[IMVY]-x-lSTGDN]-[DN].x(2)-[SGAP]-x(3,4)-G-[STG]- 
[LIVMPA]-G-[LIVMF]- 

25 1 1] Huriey J.H.. Thorsness P.E., Ramalingam V., Helmers N.H., Koshland D.E. Jr., Stroud R.M. Proc. Natl. Acad. 

Sci, U.S.A. 86:8635-8639(1989). 

[ 2] Cupp J.R.. McAlister-Henn L. J. Biol. Chem. 266:22199-22205(1991). 

[ 3] Imada K.. Sato M.. Tanaka N., Katsube Y.. Matsuura Y.. Oshima T. J. Mol. Biol. 222:725-738(1991). 
[4] Zhang T, Koshland D.E. Jr. Protein Sci. 4:84-92(1995). 
30 [ 5] Tipton P.A., Beecher B.S. Arch. Biochem. Biophys. 313:15-21(1994). 

[0928] 315. Jacalin-iike lectin domain. 

[0929] Proteins containing this domain are lectins. It is found in 1 to 6 copies in these proteins. The domain is also 
found in the animal prostatic spermine-bindtng protein (Swiss:P 15501) . 
35 [0930] [1] Sankaranarayanan R. Sekar K. Banerjee R. Sharma V. Surolia A. Vijayan M; Nat Struct Biol 1996;3: 
596-603. 

[0931] 316. KH domain 

[0932] KH motifs probably bind RNA directly. Auto antibodies to Nova, a KH domain protein, cause paraneoplastic 
opsoclonus ataxia. 

40 

[1] Burd CG, Dreyfuss G. Science 1994;265:615-621. 

[2] Musco G, Stier G. Joseph C. Castiglione Morelli MA. Nilges M. Gibson TJ. Pastore A, Cell 1996;85:237-245. 
[0933] 317. Kelch motif 

^5 [0934] The kelch motif was initially discovered in Kelch (Swlss:Q04652 ). In this protein there are six copies of the 
motif. It has been shown that Swiss:Q04652 is related to Galactose Oxidase [1] for which a structure has been solved 
[2]. The kelch motif forms a beta sheet. Several of these sheets associate to form a beta propeller structure as found 
in neur. 

[0935] [1] Bork P. Doolittle RF, J Mol Biol 1 994;236: 1 277-1 282. [2] Ito N. Phillips SB. Stevens C. Ogel ZB. McPherson 
50 MJ. Keen. JN. Yadav KD. Knowles PF. Nature 1991;350:87-90. 

[0936] 318. Soybean trypsin inhibitor (Kunitz) protease inhibitors family signature 

[0937] The soybean trypsin inhibitor (Kunitz) family [1] is one of the numerous families of proteinase inhibitors. It 
comprise plant proteins which have inhibitory activity against serine proteinases from the trypsin and subtilisin families, 
thiol proteinases and aspartic proteinases as well as some proteins that are probably involved in seed storage. This 
55 family is currently known to group the following proteins: - Trypsin inhibitors A. B. C. KTI1, and KTI2 from soybean. - 
Trypsin inhibitor DE3 from coral beans (Erythrina sp.). - Trypsin inhibitor DE5 from sandal bead tree. - Trypsin inhibitors 
1A (WTI-1A). IB {WTI-1B). and 2 (WTI-2) from goa bean. - Trypsin inhibitor from Acacia confusa. - Trypsin inhibitor 
from silk tree. - Chymotrypsin inhibitor 3 (WCI-3) from goa bean. - Cathepsin O inhibitors PDI and NDI from potato [2]. 
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p^?ns conLn from 170 to 200 amino acid residues and one or two.ntracham disuffide bonds. The best conserved 

region Is found In their N-terminal section and is used as a signature pattern 

1^38] consensus pattern: [LIVM]-x-D-x-[EDNTYHDG]-[RKHDENQ)-x-[LlVM]-x(5)-Y-x-lLIVM] - 

r 1] Laskowsl<i M., Kato I. Annu. Rev. Blochem. 49:593-626(1980). , n i Rarrott A J Turk 

1 2I Ritonja A., Krizaj I., Mesko P.. Kopltar M.. Lucovnik P.. Strukelj B.. Pungercar J.. Buttle D.J.. Barrett A.J.. Turk 
V FEBS Lett. 267:13-15(1990). 

H.. McHency U Frio P.J.. Furttk D.B. Plant Mol. BW. 16:913-915(1991). 

KLo'JSrpl:S^lmi~..>.c,.^^ 

im, .ad ch.ln. It 1. found as a con,pc,n.nt of <^ folWng .n^fc -J'^S? SSS^d pS" <*lo- 



due] 



[ 1] Kaupplnen S.. SIggaard-Andersen M., von Wettstein-Knowles P. Carlsberg ^l^^^^^^^-^^^^^^^^^^^ 

2 WItkoLki A.. Rangan V.S.. Randhawa Z.I.. Amy CM Sm-ti. S. Eur^J^ ^^^''^^.^l^.^ll^^^^^ 

3 Beck J.. Ripka S.. Slegner A.. Schiltz E.. Schweizer E. Eur. J. B.ochen^. ^92_487-498 990 " 

4 Bibb M.J., Biro S.. Motamedi H.. Collins J.F.. Hutchinson C R. EMBO J. 8-2727-2736(1989^ ^^^^^^^^ 

1 5] Sherman D.H., Malpartlda F.. Bibb M.J.. Kleser H.M.. Bibb M.J.. Hopwood D.A. EMBO J. 8.2717-2725(1989). 

r09411 320. KInesin motor domain signature and profile ^„„„ii„ tranonnrt Kirw^sin 

KInes n fl 2 31 Is a microtubule-associated force-producing protein that mayplay a role m organelle fra^^.P^^- K^^'" 

tuhTsl a ^ntral alpha-helical coiled coil domain that mediates the heavy chain dimenzation; and asmall globular C 
^S^fna 'domaT^^^^^^^^ with other proteins (such asthe kinesin light chair,s). v^^^^^^^^^ 
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gation of nonexchange chromosomes during meiosis. - Human CENP-E [4]. CENP-E is a protein that associates with 
kinetochores during chromosome congression. relocates to the spindle midzone at anaphase, and is quantitatively 
discarded at the end of the cell division. CENP-E is probably an important motor molecule in chromosome movement 
and/ or spindle elongation. • Human mitotic kinesin-like protein-1 (MKLP-1), a motor protein whose activity is directed 

5 toward the microtubule's plus end. - Yeast KAR3 protein, which is essential for yeast nuclear fusion during mating. 
KAR3 may mediate microtubule sliding during nuclear fusion and possibly mitosis. - Yeast CIN8 and KIP1 proteins 
which are required for the assembly of the mitotic spindle. Both proteins seem to interact with spindle microtubules to 
produce an outwardly directed force acting upon the poles. - Fission yeast cut? protein, which is essential for spindle 
body duplication during mitotic division. - Emericella nidulans bimC, which plays an important role in nuclear division. 

10 - Emericella nidulans kIpA. - Caenorhabditis elegans unc-104, which may be required for the transport of substances 
needed for neuronal cell differentiation. - Caenorhabditis elegans osm-3. - Xenopus Eg5, which may be involved in 
mitosis. - Arabidopsis thaliana KatA. KatB and katC. - Chlamydomonas reinhardtii FLA10/KHP1 and KLP1. Both pro- 
teins seem to play a role in the rotation or twisting of the microtubules of the flagella. - Caenorhabditis elegans hypo- 
thetical protein T09A5.2.The kinesin motor domain is located in the N-terminal part of most of theabove proteins, with 

15 the exception of KAR3, klpA, and ncd where it is locatedin the C-terminal section.The kinesin motor domain contains 
about 330 amino acids. An ATP-binding motifof type A is found near position 80 to 90, the C-terminal half of the domainis 
involved in microtubule-binding. The signature pattern for that domain isderived from a conserved decapeptide inside 
the microtubule-binding part. 

Consensus pattern: IGSA]-[KRHPSTQVM]-[LIVMF]-x-[LIVMF]-IIVC]-D-L-[AH]-G-[SAN]-E 

20 

1 1] Bloom G.S., Endow S.A. Protein Prof. 2:1109-1171(1995). 

[ 2] Vallee R.B., Shpetner H.S. Annu. Rev. Biochem. 59:909-932(1990). 

[ 3] Brady S.T. Trends Cell Biol. 5:159-164(1995). 

[4] Endow S.A. Trends Biochem. Sci. 16:221-225(1 991 ).[E1 J 

25 

[0942] 321. Ribosomal protein LI 5 signature 

Ribosomal protein LI 5 is one of the proteins from the large ribosomal subunit. In Escherichia coll, LI 5 is known to bind 
the 23S rRNA. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1], groups: - 
Eubacterial LI 5. - Plant chloroplast LI 5 (nuclear-encoded). - Archaebacteria! LI 5. - Vertebrate L27a. - Tetrahymena 
30 thermophila L29- - Fungi L27a (L29, CRP-1 , CYH2).L15 is a protein of 144 to 154 amino-acid residues. As a signature 
pattern, a conserved region was selected in the C-terminal section of these proteins. 

[0943] Consensus pattern: K-[LIVM](2)-[GASL]-x-[GTl-x-[LIVMA]-x(2,5)-[LIVM]-x-[LIVMF]-x(3,4)-[LIVMFCA]-[ST]-x 
(2)-A-x(3)-[LIVM]-x(3)-G 

[0944] [ 1] Otaka E.. Hashimoto T., Mizuta K.. Suzuki K. Protein Seq. Data Anal. 5:301-313(1993). 

35 [0945] 322. LBP / BPI / CETP family signature 

The following mammalian lipid-binding serum glycoproteins belong to the same family [1,2,3]: - Lipopolysaccharide- 
binding protein (LBP). LBP binds to the lipid A moiety of bacterial lipopolysaccharides (LPS), a glycoiipid present in 
the outer membrane of all Gram-negative bacteria. The LBP/LPS complex seems to interact with the CD14 receptor 
and may be responsible for the secretion of alpha-TNF. - Bactericidal permeability-increasing protein (BPI). Like LBP, 

40 BPI binds LPS and has a cytotoxic activity on Gram-negative bacteria. - Cholesteryl ester transfer protein (CETP). 
CETP is involved in the transfer of insoluble cholesteryl esters in reverse cholesterol transport. - Phospholipid transfer 
protein (PLTP). May play a key role in extracellular phospholipid transport and modulation of HDL particles. These 
proteins are structurally related and share many regions of sequencesimilarities. As a signature pattern one of these 
regions was selected, which is located in the N-terminal section of these proteins; a region which could be involved in 

45 the binding to the lipids [2]. 

Consensus pattern: [PA]-[GA]-[LIVMC]-x(2)-R-[IV]-[ST|-x(3)-L-x(5)-[EQ]-x(4)-[LIVM]-[EQK)-x(8)-P 

[ 1] Schumann R.R.. Leong S.R.. Flaggs G.W., Gray P.W., Wright S.D.. Mathison J.C.. Tobias P.S., Ulevitch R.J. 
Science 249:1429-1431(1990). 
50 [ 2] Gray P.W„ Flaggs G.. Leong S.R.. Gumina R.J.. Weiss J., Ooi C.E.. Elsbach P. J. Biol. Chem. 264:9505-9509 

(1989). 

[3] Day J.R.. Albers J.J., Lofton-Day C.E., Gilbert T.L.. Ching A.F,T.. Grant F.J., O'Hara P.J., Marcovina S.M., 
Adolphson J.L. J. Biol. Chem. 269:9388-9391(1994). 

55 [0946] 323. LIM domain signature and profile 

Recently [1,2] a number of proteins have been found to contain a conserved cysteine-rich domain of about 60 amino- 
acid residues. These proteins are: - Caenorhabditis elegans mec-3; a protein required for the differentiation of the set 
of six touch receptor neurons in this nematode. - Caenoriiabditis elegans lin-11 ; a protein required for the asymmetric 
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^^^^^^^^ 

scfiptional enhancer found in the insulin gene. - Mammalian LH 2- ^ Jranscnp^a ^ nrotein apterous, required for 
con^l of cell differentiation in developing lymphoid and neural ^^''JVP^^- " ^-^^^^P^''^^^^^^ and LIMK-2. - 

the normal development of the wing and halter imagina ^'^f ^^^.^J^f f^^^^^^^ proteins of about 

Mammalian rt,ombotins. Rhombotin 1 (RBTN1 orTTG-1) ^"f J^^"^. '""J 
160 amino acids v^ose genes aredisrupted by chromosomalfrans o^^^^^^^ 

cystelne-rich protein (CRP), a 192 amino^cid protein ""f n,ay function 

cysteine-rich intestinal protein (CRIP), a small protein ^'''''^..f !°j^^T^!t3; protein. - Mouse testin. 

as an intracellular zinc transport protein. - Vertebrate P^'^'"'"' ^ ^y^"^"^^;^^^^^ ! Sunflower pollen specific protein 
Mouse testin Should notbe confused with rattestinv^ich.sathiolp«^^as^^^^^ 

SF3. - Chicken zyxin. Zyxin is a low-abundance adhesion plaque P^^^-J^^;^^^ Satg i^Sein RGA1/DBM1 . - 
- Yeast protein LRG1 Which is involved in sporu at.on [4L^^^^^^^^^^ 

Caenorhabditis elegans homeobox protein ceh-14. " Caenorhabdms e e^^^^^ P ^^^^ 

thetical protein YKR090w -Caeno^^^^^^^ 

;r:t3t;erc:~^^^^^^ 
fhrdi-riS^^^^^^^^^ 



30 



35 



40 



45 



[09471 

bind zinc] 



r 11 Frevd G Kim S.K., Horvitz H.R. Nature 344:876-879(1990). 

2 Bal^ R Evrard J.-L.. Demon C, Steinmetz A. Plant Cell 4:1465- 466(1992). 

3 Sanchez-Garcia 1.. Rabbitts T.H. Trends Genet. 10^3^5-320(1994 . 00.3151-3154(1994) 



(1993). 



1993;366:751-756. 



[0949] 325. Plant lipid transfer protein family signature (LTP ^^^^^^^ 

0950] Plant cells contain proteins, called Md t-^!^: j^^^^^^^^^ is notyet known, could 

of phospholipids and other llpidsacross membranes. These P^"'^"!.;,^.^/^^^^^ ^^^^^ from their site of bio- 

p,ay a major role in membrane biogenesis by phosphohp^^^^ ^^.^^ ^^.^ 



sentation. 

^ ^ + I + + 1 1 1 1 1 



xCxxxxCxxxxxxCCxxxxxxxxCxCxxxxxx 



XX 



xxxCxxxxxxCxx 1 1 1 1 + 1- 



55 



conserved cysteine involved in a disulfide bond, 
position of the pattern. 
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[0951] Consensus pattern: [LIVM]-[PA]-x(2)-C-x-[UVM].x-[LIVM]-x.[LIVMFY]-x-[LIVM]-[ST]-x(3)-[DN]-C-x(2)- 
[LIVM] [The two C's are involved in disulfide bonds] 

[1] Wirtz K.W.A. Annu. Rev. Biochem. 60:73-99(1991). 
5 [2] Arondel V.. Kader J.C. Experientia 46:579-585(1 990). 

[3] Ohirogge J.B., Browse J.. Somerville C.R. Biochim. Biophys. Acta 1082:1-26(1991). 

[0952] 326. (LAMP) Lysosome-associated membrane glycoproteins signatures 

Lysosome-associated membrane glycoproteins (lamp) [1] are integral membrane proteins, specific to lysosomes, and 
10 whose exact biological function is not yet clear. Structurally, the lamp proteins consist of two intemally homologous 
lysosome-luminai domains separated by a proline-rich hinge region; at the C-terminal extremity there is a transmem- 
brane region followed by a very short cytoplasmic tail. In each of the duplicated domains, there are two conserved 
disulfide bonds. This structure is schematically represented in the figure below. 

+ + +..„.+ H -^llllllll 

xCxxxxxCxxxxxxxxxxxxCxxxxxCxxxxxxxxxCxxxxxCxxxxxxxxxxxxCxxxxxCxxxxxxxx 
< xHingex ><TM><C> 

Inmammals, there are two closely related types of lamp: lamp-1 andlamp-2. In chicken lamp-1 is known as LEPIOO.The 
macrophage protein CD68 (or macrosialin) [2] is a heavily glycosylatedintegral membrane protein whose structure 
consists of a mucin-like domain followed by a proline-rich hinge; a single lamp-like domain; a transmembrane region 
and a short cytoplasmic tail. Two signature patterns for this family of proteins were developed. The first oneis centered 
on the first conserved cysteine of the duplicated domains. The second corresponds to a region that includes the ex- 
tremity of the second domain, the totality of the transmembrane region and the cytoplasmic tail. 
[0953] Consensus pattern: [STA]-C-[LIVM]-[LIVMFYVV]-A-x-[LIVMFYW]-x(3)-[LIVMFYVV]-x(3)-Y [C is involved in a 
disulfide bond] - 

Consensus pattern: C-x(2)-D-x(3,4)-[LIVIV1](2)-P-(LIVM]-x-[LIVM]-G.x(2)-[LIVM]- x-G-[LIVM](2)-x-[LIVMJ(4)-A-[FY]-x- 
[LIVM]-x(2)-[KR]-[RH]- x(1.2)-[STAG](2)-Y-[EQ] [C is involved in a disulfide bond] 

[ 1] Fukuda M. J. Biol. Chem. 266:21327-21330(1991). 

[2] Holness C.L., da Silva R.P., Fawcett J., Gordon S., Simmons D.L. J. Biol. Chem. 268:9661-9666(1993). 
[0954] 327. Lipolytic enzymes "G-D-S-L" family, serine active site 

[0955] Recently [1], a family of lipolytic enzymes has been characterized. This family currently consist of the following 
proteins: 

Aeromonas hydrophila lipase/phosphatidylcholine-sterol acyltransferase. 
Xenorhabdus luminescens lipase 1 . 
Vibrio mimicus arylesterase. 
Escherichia coli acyl-coA thioesterase I (gene tesA). 
Vibrio parahaemolyticus thermolabile hemolysin/atypical phospholipase. 

Rabbit phospholipase AdRab-B, an intestinal brush border protein with esterase and phospholipase A/lysophos- 
pholipase activity that could be involved in the uptake of dietary lipids. AdRab-B contains four repeats of about 
320 amino acids. 

Arabidopsis thaliana and Brassic napus anther-specific proline-rich protein APG. 

A Pseudomonas putida hypothetical protein in trpE-trpG intergenic region. A serine has been identified a part of 
the active site in the Aeromonas, Vibrio mimicus and Escherichia coli enzymes. It is located in a conserved se- 
quence motif that can be used as a signature pattern for these proteins. 

- Consensus partem: [LIVMFYAGl(4)-G-D-S-lLIVM)-x(1.2)-[TAG]-G [S is the active site residue] 

55 [0956] 328. (Lipoprotein 4) Prokaryotic membrane lipoprotein lipid attachment site In prokaryotes, membrane lipo- 
proteins are synthesized with a precursor signal peptide, which Is cleaved by a specific lipoprotein signal peptidase 
(signalpeptidase II). The peptidase recognizes a conserved sequence and cuts upstreamof a cysteine residue to which 
a glyceride-fatty acid lipid is attached [1].Some of the proteins known to undergo such processing currently include 
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(forrecent listings see [1 .2.3]): - Major outer membrane lipoprotein (murein-lipoproteins) (gene Ipp). - Escherichia a^ii 
Itorreceniiisiings s i; ^^J^^^.^ y,^„^^i^r^ fgene nloB). - Escherichia coli lipoprotein nIpC. - Es- 

re!;ra';;T.i?o;;;ter^^^^^^^^ --ducible apoprotein B (gene osmB). - Escherichia co.i 

SmotSllytdSe Hpoprotein E (gene osmE). - Escherichia coli peptidoglyca-^ssociated lipoprotein (gene paO^_ 
rsThericNa^li rare lipZoteins A and B (genes rplA and rplB). - Escherichia coli copper hom^stasis Prote." 0"*^ 
?r nlpE E-herichla'co'li plasmids traT proteins. - Escherichia coli Col p.asmids lysis P-*-- ""^^^^^^^ 
beta-lactamases - Bacillus subtilis periplasmic oligopeptide-binding protein (gene oppA). - Borrel.a burgdorfen oirter 
suice pS^ A and B (genes ospA and ospB). - Borrelia heonsii variable major protein 21 (gene vmp21) and 7 
?ge?e v,!S?r Chlamydia Sachoma«s outer membrane protein 3 (gene omp3). - Fibrobacter -"^-^'^-Sf;^^^^^^^^^ 
«nasrS-3 - Haemophilus influenzae proteins Pal and Pep. - Klebsiella pullulunase (gene pulA). - Websieha pi^lu- 
Tase sTcreiion p^teiS pulS. - Mycoplasma hyorhinis protein p37. - Mycoplasma hyorhinis ^ar.an surface ant gens 
rr^d C (aenes vIpABC) - Neisseria outer membrane protein H.8. - Pseudomonas aeruginosa hpopepMe (gene 
TppLi Pse rmonas'tolanacearum endoglucanase egl. - Rhodopseudomonas viridis --^^ f"'- «^°^^3^^^ 
subunit (gene cytC). - Rickettsia 17 Kd antigen. - Shigella flexneri invasion plasmid proteins mxU ^"d mxiM^ - Sfrep^ 
t^ccus pneumoniae oligopeptide transport protein A (gene amlA). - Treponema pallidium 34 Kd f -T'«P«»^^,^ 
Sdium memJ^ne protL^gene tmpA). - Vibrio har^^^^ 

JscJ. - Halocyanin from Natrobacterium pharaon is [4], a membrane associated copper- '^'"^'"9 P^^^"' J*^'^ '^j'^^^ 
Krchaebacterialproteinknovimtobemodified in suchafashion).From the precursor sequencesofallth^^^ 

a «,n2^sus Stem and a set of rules to identify this type of post-translational modification was denyed. 
wST cons^nsu^ pattern: {DERK}(6)-[LIVMFWSTAG1(2HUVMFYSTAGCQ]-IAGS]-C [C .s the lipid attachment 
S AddSonal i^es: \) The cysteine must be between positions 15 and 35 of the sequence in consideration. 2) There 
must be at least one Lys or one Arg in the first seven positions of the sequence. 

[ 1] Hayashi S., Wu H.C. J. Bioenerg. Biomembr. 22:451-471(1990). 
[ 2] Klein P. Somorjai R.L.. l^u RC.K. Protein Eng. 2:15-20(1988). 

[31 von HeijneG. Protein Eng. 2:531-534(1989). ,^ ^ . oi...i rh»,T, orq •I4q'?9.14945 

[ 4] Mattar S.. Scharf B.. Kent S.B.H.. Rodewald K.. Oestertielt D.. Engelhard M. J. Biol. Chem. 269.14939 14945 

(1994). 

[0958] 329. (Lopoprotein 5) Prokaryotic membrane lipoprotein lipid attachment site. In prokaryotes, ^^^^'^^ 
proteins are synthesized witl^ a precursor signal peptide, which is cleaved by a speafic lipoprotein signal pep^^ase 
fsTgnrpepfidase II). The peptidase recognizes a conserved sequence and cuts upstream of a cysteine residue to 
IS a gKide-fatty acid Md is attached lU.Some of the proteins known to undergo ^-f^^l^^.^"^^^ 
delude ffor recent listings see 11.2,3]): - Major outer membrane lipoprotein (murein-lipoproteins) (gene pp). - Es- 
SSifiinSrSeli-MTgen^ nlpA). - Escherichia coll lipoprotein-34 (gene nIpB). - Escherichia col, lipoprotein 
Ic EsclirSStooproteinnlpD.-Es 

S osSSytL^SS^^^^^^ (gene osmE). - Escherichia coli peptidoglycan^ssociated lipoprotein (gene pa )_ 

E^erichTa «)li rare lipoproteins A and B (genes rplA and rplB). - Escherichia coli ~PP«^''°'"^^^^3^^^^^^ 
(or nIpE). - Escherichia coli plasmids traT proteins. - Escherichia coli Col plasmids lysis P™*^'"^'^^ """^^^^^^^^^^^ 
beta-lactemases. - Bacillus subtilis periplasmic oligopeptide-binding protein (gene oppA) - ^^'^l'^^^ 
s^rfa^Troteins A and B (genes ospA and ospB). - Borrelia hemnsii variable major protein 21 (gene vmp21) and 7 
feeTvm^r cLmill « -t^r membrane protein 3 (gene omp3). - Fibrobacter --nogen^ e^^^^^^^^^^ 
Snas^S-S - Haemophilus influenzae proteins Pal and Pep, - Klebsiella pullulunase (gene pulA). - KlebsieHa pullu- 
S?as?s^iion Srotein pulS. - Mycoplasma hyortiinis protein p37. - Mycoplasma hyorhinis variant surface ant gens 
A B and STgenervlp ABC). - NeLeria outer membrane protein H.B. - Pseudomonas aeruginosa lipopeptide (gene 
TppLi Pse dTmonas solaJacearum endoglucanase egl. - Rhodopseudomonas viridis '^^^^^^^ 
K /^=r,o r-sAr\ Rirkettsia 17 Kd antiqen - Shigella flexneri invasion plasmid proteins mxiJ and mxiM. - Strep- 

?orcu?pneuro^ 

pa'diur^ merllbrane protein A^gene tmpA). - Vibrio harveyi chitobiase (gene chb). - ^^-j^ — ^^^^^^^ 
yscJ. - Halocyanin from Natrobacterium pharaonis [4], a membrane associated copper- binding P^° '^^^^^^^ 
aSaebacterial protein known to be modified in such a fashion).From the precursor sequences of all these proteins, 
a^nsensus Smm and a set of rules to identify this type of post-translational ^"^^ 
?oS9] Concensus pattern: {DERKK6)-[LIVMFWSTAG]{2)-[L1VMFYSTAGCQ1-IAGS]-C [C ,s the lipid attachrnent 
KddSonaT^es: ' The cysteine must be between positions 15 and 35 of the sequence in consideration. 2) There 
must be at least one Lys or one Arg in the first seven positions of the sequence. ori^ 
^0960] [1 Hayashi S., Wu H.C. J. Bioenerg. Biomembr. 22:451^71(1990) [ g Wein P- S<^ona. R ^ ' ^a" ^^CX. 
Proteii Eni 2:15-20(1988).[3] von Heijne G. Protein ^"9, 2:531^34(1^^^^^^^^ S.. Scharf B.. Kent S.B.H.. 

Rodewald K.. Oesterhelt D.. Engelhard M. J. Btol. Chem. 269:14939-14945(1904). 
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[0961] 330. (Lum binding) Riboflavin synthase alpha chain family Lum-blnding site signature The following proteins 
have been shown [1.2] to be structurally and evolutionary related: - Riboflavin synthase alpha chain (RS-alpha) (gene 
ribC in Escherichia coli. ribB in Bacillus subtilis and Photobacterium leiognathi, RIBS in yeast). This enzyme synthesizes 
riboflavin from two moles of 6.7- dimethyl-8-(1*-0-ribityl)lumazine (Lum). a pteridtne-derivative. - Photobacterium phos- 

5 phoreum lumazine protein (LumP) (gene luxL). LumP is a protein that modulates the color of the bioluminescence 
emission of bacterial luciferase. In the presence of LumP, light emission is shifted to higher energy values (shorter 
wavelength). LumP binds non-covalently to 6,7-dimethyl-8-(1'-D-ribityl) lumazine. - Vibrio fischeri yellow fluorescent 
protein (YFP) (gene luxY). Like LumP, YFP modulates light emission but towards a longer wavelength. YFP binds non- 
covalently to FMN. These proteins seem to have evolved from the duplication of a domain of aboutlOO residues. In its 

10 C-terminal section, this domain contains a conserved motif [KR]-V-N-[L1]-E which has been proposed to be the binding 
site for Lum.RS-alpha which binds two molecules of Lum has two perfect copies of this motif, while LumP which binds 
one molecule of Lum. has a Glu instead of Lys/Arg in the first position of the second copy of the motif. Similarily, YFP, 
which binds to one molecule of FMN. also seems to have a potentially dysfunctional binding site by substitution of Gly 
for Glu in the last positionof the first copy of the motif. Our signature pattern includes the Lum-binding motif. 

15 [0962] Consensus pattem: [LIVMF]-x(5)-G-[STADNQ]-[KREQIYW]-V-N-[LIVM]-E 

[ 1] O'Kane D.J., Woodward B.. Lee J.. Prasher D.C. Proc. Natl. Acad. Sci. U.S.A. 88:1100-1104(1991). 
[ 2] O'Kane D.J., Prasher D.C. Mol. Microbiol. 6:443-449(1992). 

20 [0963] 331. Lysyl oxidase putative copper-binding region signature 

Lysyl oxidase (LOX) [1] is an extracellular copper-dependent enzyme that catalyzes the oxidative deamination of pep- 
tidyl lysine residues in precursors of various collagens and elastins. The deaminated lysines are then able to form 
aldehyde cross-links. LOX binds a single copper atom which seems to reside within an octahedral coordination complex 
which includes at least three histidine ligands. Fourhistidine residues are clustered in a central region of the enzyme. 

25 This region is thought to be involved in cooper-binding and is called the 'copper-talon* [1]. This region was used as a 
signature pattem. 

[0964] Consensus pattem: W-E-W-H-S-C-H-Q-H-Y-H 

[0965] 1 1] Krebs C.J.. Krawetz S.A. Biochim. Biophys. Acta 1202:7-12(1993). 
[0966] 332. Metallo-beta-lactamase superfamily (lactamase_B) 
30 [0967] [1]: Neuwald AF, Liu JS, Lipman DJ, Lawrence CE, Nucleic Acids Res 1997;25:1665-1677. [2] Carfi A, Pares 
S. Duee E, Galleni M, Duez C, Frere JM, Dideberg O, EMBO J 1995;14:4914-4921. 
[0968] 333. L-lactate dehydrogenase active site (Idhi) 

L-lactate dehydrogenase (EC 1.1.1.27) (LDH) [1] catalyzes the reversible NAD-dependent interconversion of pyruvate 
to L-lactate. In vertebrate muscles and in lactic acid bacteria it represents the final step in anaerobic glycolysis. This 

35 tetrameric enzyme is present in prokaryotic and eukaryotic organisms. Invertebrates there are three isozymes of LDH: 
the M form (LDH-A), found predominantly in muscle tissues; the H form (LDH-B). found in heart muscle and the X form 
(LDH-C), found only in the spermatozoa of mammals and birds. In birds and crocodilian eye lenses, LDH-B serves as 
a structural protein and is known as epsilon-crystallin [2].L-2-hydroxyisocaproate dehydrogenase (EC 1.1.1.-) (L-hicDH) 
[3] catalyzes the reversible and stereospecific interconversion between 2-ketocarboxylic acids and L-2-hydroxy-car- 

^0 boxylic acids. L-hicDH is evolutionary related to LDH's. As a signature for LDH's a region was selected that includes 
a conserved histidine which is essential to the catalytic mechanism. 
[0969] Consensus pattem: [LIVMA]-G-[EQ]-H-G-[DN]-[ST] [H is the active site residue] - 

[ 1] Abad-Zapatero C. Griffith J.P., Sussman J.L.. Rossmann M.G. J. Mol. Biol. 198:445-467(1987). 
^5 1 2] Hendriks W.. Mulders J.W.M.. Bibby M.A.. Slingsby C. Bloemendal H.. de Jong W.W. Proc. Natl. Acad. Sci. 

U.S.A. 85:7114-7118(1988). 

[3] Lerch H.-P, Frank R.. Collins J. Gene 83:263-270(1989). 

[0970] Malate dehydrogenase active site signature (Idh2) 

50 Malate dehydrogenase (EC 1.1.1.37) (MDH) [1 ,2] catalyzes the interconversion of malate to oxaloacetate utilizing the 
NAD/NADH cofactor system. The enzyme participates in the citric acid cycle and exists in all aerobic organisms. While 
prokaryotic organisms contains a single form of MDH, in eukaryotic cells there are two isozymes: one which is located 
in the mitochondrial matrix and the other in the cytoplasm. Fungi and plants also harbor a glyoxysomal form which 
functions in the glyoxylate pathway. In plants chloroplast there is an additional NADP-dependent form of MDH (EC 

55 1.1.1.82) which is essential for both the universal C3 photosynthesis (Calvin) cycle and the more specializedC4 cycle. 
As a signature pattem for this enzyme a region was chosen that includes two residues involved in the catalytic mech- 
anism [3]: an aspartic acid which is involved in a proton relay mechanism, and an arginine which binds the substrate. 
[0971] Consensus pattem: [LIVM]-T-[TRKMN]-L-D-x(2)-R-[STA]-x(3)-[LIVMFY] [D and R are the active site resi- 
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dues]- 

[ 1] McAlister-Henn L. Trends Biochem. Sci. 13:178-181(1988). 
[ 2] Gietl C. Biochim. Biophys. Acta 1100:217-234(1992) „„„,,,aftQ^ 
3 Birktoft J.J., Rhodes G.. Banaszak L.J. Biochemistry 28:6065-6081 ( 989). .3,3„ 993. 

t4i Cendrin R. Chroboczek J.. Zaccai G.. Eisenberg H.. Mevarech M. Biochemistry 32.4308-4313(1993). 

rosTs] consensus pattern: [LIV]-[STAG]-V-[DEQVHFL1]-D-[ST1 [D binds manganese and calcium]- 
Consensus pattern: [LIV]-x-[EDQ]-[FYWKR]-V-x-[LlVn-G-[LF]-[ST|- 

[ 1] Sharon N., Lis H. FASEB J. 4:3198-320(1990). 

[ 2] Lis H., Sharon N. Annu. Rev. Biochem. 55:33-37(1986). 

K SJ^slornTir Of Chromatin structure. HI links nuCeosomes into higher order 

structures Histone HI is replaced by histone H5 in some cell types 1003.362-219-223 
[09791 [1] Ramakrishnan V. Finch JT, Graziano V. Lee PL. Sweet RM. Nature 1993,362.219 223. 
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or two of these [3.18]. A signature pattern was built around the first, common to all outlier and kemallipocaltns, which 
occurs near the start of the first beta-strand. 

[0981] Consensus pattem: [DENG]-x-(DENQGSTARKl-x(0.2HDENQARK]-[LIVFY]-{CP}-G-{C}- W-[FYWLRH]-x- 
[LIVMTA]- 

5 Note: it is suggested, on the basis of similarities of structure, function, and sequence, that this family forms an overall 
superfamily, called the calycins. with the avidin/streptavidin <PDOC00499 > and the cytosolic fatty- acid binding proteins 
< PDOC00188 > families [3,19] 

[ 1] Cowan S.W.. Newcomer M.E., Jones T.A. Proteins 8:44-61(1990). 
10 [ 2] Igaraishi M., Nagata A.. Toh H.. Urade H., Hayaishi N. Proc. Natl. Acad. Sci. U.S.A. 89:5376-5380(1992). 

[ 3] Flower D.R.. North A.C.T., Attwood T.K. Protein Sci. 2:753-761(1993). 
[4] Godovac-Zimmermann J. Trends Biochem. Sci. 13:64-66(1988). 
[ 5] Pervaiz S., Brew K. FASEB J. 1:209-214(1987). 

[6] Kremer J.M.H.. Wilting J., Janssen L.H.M. Pharmacol. Rev. 40:1-47(1989). 
15 [7] Haefliger J.-A.. Peitsch M.C., Jenne D., Tschopp J. Mol. Immunol. 28:123-131(1991). 

[ 8] Keen J.N.. Caceres I., Eliopoulos E.E., Zagalsky P.F., Findlay J. B.C. Eur. J. Biochem. 197:407-417(1991). 
[ 9] Newcomer M.E. Structure 1:7-18(1993). 

[10] Collet C, Joseph R. Biochim. Biophys. Acta 1167:219-222(1993). 

[11] Kjeldsen L.. Johnsen A.H.. Sengelov H.. Bon-egaard N. J. Biol. Chem. 268:10425-10432(1993). 
20 [12] Peitsch M.C.. Boguski M.S. Trends Biochem. Sci. 16:363-363(1991). 

[13] Miyawaki A., Matsushita Y.R., Ryo Y., Mikoshiba T EMBO J. 13:5835-5842(1994). 
[14] Keck K., Ahlers C, Schmale H. Eur. J. Biochem. 221:905-916(1994). 

[15] Achen M.G., Harms PJ., Thomas T. Richardson S.J.. Wettenhall R.E.H., Schreiber G. J. Biol. Chem. 267: 
23170-23174(1992). 

25 [16] Morel L.. Dufan^e J.-P., Depeiges A. J. Biol. Chem. 268:10274-10281(1993). 

[17] Bishop R.E.. Penfold S.S., Frost U.S., Holtje J.V., Weiner J.H. J. Bioi. Chem. 270:23097-23103(1995). 
[18] Flower D.R.. North A.C.T.. Attwood T.K. Biochem. Biophys. Res. Commun. 180:69-74(1991). 
[19] Flower D.R. FEBS Lett. 333:99-102(1993). 

30 [0982] Cytosolic fatty-acid binding proteins signature (Iip2) 

A number of low molecular weight proteins which bind fatty acids and other organic anions are present in the cytosol 
[1,2]. Most of them are structurally related and have probably diverged from a common ancestor. This structure is a 
ten stranded antiparallel beta-barrel, albeit with a wide discontinuity between the fourth and fifth strands, with a repeated 
-I* 1 topology enclosing an internal ligand binding site [2.7]. Proteins known to belong to this family include: - Six, tissue- 

35 specific, types of fatty acid binding proteins (FABPs) found in liver, intestine, heart, epidermal, adipocyte, brain/retina. 
Heart FABP is also known as mammary-derived growth inhibitor (MDGI), a protein that reversibly inhibits proliferation 
of mammary carcinoma cells. Epidermal FABP is also known as psoriasis-associated FABP [3]. - Insect muscle fatty 
acid-binding proteins. - Testis lipid binding protein (TLBP). - Cellular retinol-binding proteins I and II (CRBP). - Cellular 
retinoic acid-binding protein (CRABP). - Gastrotropin, an ileal protein which stimulates gastric acid and pepsinogen 

40 secretion. It seems that gastrotropin binds to bile salts and bilirubins. - Fatty acid binding proteins MFB1 and MFB2 
from the midgut of the insect Manduca sexta [4].ln addition to the above cytosolic proteins, this family also includes: - 
Myelin P2 protein, which may be a lipid transport protein in Schwann cells. P2 is associated with the lipid bilayer of 
myelin. - Schistosoma mansoni protein Sm14 [5] which seems to be involved in the transport of fatty acids. - Ascaris 
suum p18 a secreted protein that may play a role in sequestering potentially toxic fatty acids and their peroxidation 

45 products or that may be involved in the maintenance of the impermeable lipid layer of the eggshell. - Hypothetical fatty 
acid-binding proteins F40F4.2, F40F4.3, F40F4.4 and ZK742.5 from Caenorhabditis elegans. As a signature pattem 
for these proteins a segment from the N-terminal extremity was use. 

[0983] Consensus pattem: [GSAIVK]-x-[FYW]-x-[LIVMF]-x(4)-[NHG]-[FY]-[DE]-x-[LIVMFY]-[LIVM]-x(2)-[LIV- 
MAKR]- 

50 Note: it is suggested, on the basis of similarities of structure, function, and sequence, that this family fomis an overall 
superfamily, called the calycins. with the lipocalin <PDOC00187 > and avidin/streptavidin < PDQC00499 > families [6,7]. 

[ 1] Bemier I., Jolles R Biochimie 69:1127-1152(1987). 

[ 2] Veerkamp J.H., Peeters R.A., Maatman R.G.H.J. Biochim. Biophys. Acta 1081:1-24(1991), 
55 [ 3] Siegenthaler G., Hotz R., Chatellard-Gruaz D.. Didierjean L., Hellman U., Saurat J.-H. Biochem. J. 302:363-371 

(1994). 

[ 4] Smith A.F., Tsuchida K.. Hanneman E., Suzuki T.C., Wells M.A. J. Biol. Chem. 267:380-384(1992). 
[ 5] Moser D.. Tendler M., Griffiths G., Klinkert M.-Q. J. Biol. Chem. 266:8447-8454(1991). 
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[ 6] Flower D.R., North A.C.T. Attwood T.K. Protein Sci. 2:753-761(1993). 

r -n 1-1 r» D CCDC I o« 09/1 993V 

Z^J^^^^::^ :rnSrrg d^xygenases which catalyzes the M~P^oxic^«onof 
of dive^raTpSof plant physiology incl^ 

i to wounding m.^^^^^^^ a number of lipoxygenases isozymes are involved in the 

and te*kotrienes [21. Sequence data is available forthe following lipoxygenases: - Plant lipoxygenases (ECIMJJ)^ 
Sr^i eTrSs a varietj of cytosolic isozymes as well as what seems [3] to be a -^'oroplas, isozyme^ ^ M— an 
arachidonate S-lipoxygenase (EC 1.13.11.34) . - Mammalian arachidonate 12-lipoxygenase (EC IMMLID- " Mam 
Sa^e^^tc^d ceVs'pecific 1 5-lip^^?^ (EC 1-1 3-11 ■33) .The Iron atom in lipoxygenases .s ^ound by four h^^^^^^ 
mSe of Which are histidine residues [4]. Six histidines are conserved in all «P°^9«"^.«« ^.^^'^ri " SLfhSin^s 
ound dustered in a stretch of 40 amino acids. This region contains two of the three ^^1'=^^^^^;^^ °^^^'Z 
have been shown [5] to be important for the activity of lipoxygenases. As signatures for this family of ercyn^es two 
Sre.2!n the^ion of the hisJdine Cluster were selected. The first pattern contains the fi^ 

and the second pattern includes the fourth and the fifth. 

consensus pattern: H-[EQ]-x(3)-H-x-[LMl-[NQRC]-[GST|-H-ILIVMSTAC](3)-E fThe second and th.rd H s b.nd .ron]- 
[0985] Consensus pattern: tLIVMA]-H-P-[LIVM]-x-[KRQHLIVMF](2)-x-[AP]-H- 

[ 1] Vick B.A.. Zimmerman D.C. (In) Biochemistry of plants: A comprehensive treatise. Stumpf P.K., Ed.. Vol. 9. 

pp.53-90. Academic Press, New-York, (1987). _ „■ y.. cc-ea in9/iQRfi^ 

[2] Needleman R. Turk J.. Jakschik B.A., Morrison A.R., Leilcowith ' f -^^'^^-^^ fJ^^^^^ 

3 Peng YL , Shirano Y, Ohta H., Hibino T, Tanaka K.. Shibata D. J. Biol. Chem. 269.3755-3761(1994). 

4 BoyTngton J.C.. Gaffney B.J.. Amzel L.M. Science 260:1482-1486(1993) ^^^^„,,^2^ 
[ Si Steczko J.. Donoho G.R. Clemens J.C.. Dixon J.E.. Axeliod B. Biochemistry 31:4053^057(1992). 

109861 339. Fumarate lyases signature (lyase_1) . ^ „K«,«n m oi tn ^harp 

A number of enzymes, belonging to the lyase class, for which fumarate is a substrate have been shown t; .2 tojha^^ 
^shoTJonserv^d se.uence^round a ^--^^-^^^^^^^^^ 

:Sratir:rare^ThTre^^^^^^^^^^^ 

iLZ?ls^en"S collfu^^^^^^ are them^ostable and tetrameric and are found in prol«ryotes (as 

for ^mpfe 1^^^^^^^^^ fu-nB) as well as in eukaiyotes. The sequence of the two classes of fumarases 

ain^tTieifr^S-^^^^^ 

TailrtaTefofumarateanS ammonia. This reaction isanalogous to that catal^^^^^^ 

rather than water is involved in the trans-elimination reaction. -Arginosuccinase (EC4,12J.) <af9n'nosuccinate lyase, 
SS^lSs he formation of arginine and fumarate from argininosuocinate. the last step in the biosynthesis of 
rrcinin? £eny loLccinase (EC 4.3.2.2) (adenylosuccinate tyase) [3]. which catalyzes the eight step .n th^de novo 
Zynthes^of purines, the fLal^f 5--phosphoribosyl-5-amino-4-imidazole<^rboxam.de and 
^!pCpho^ibosyl)-4-(N-succino-carbo^^ That enzyme can also catalyzes ^^'^^^^^'"^^'V^^^. 
lorn adenylosuccinate. - Pseudomonas putida 3-carboxy-cis,cis-muconate (^rboxy 
nTonatelactonizIng enzyme) (gene pcaB) [41. an enzyme involved in aromatic acids catabolism 
[0987] Consensus pattern: G-S-x(2)-M-x(2)-K-x-N- 

1 11 Woods S A.. Shwartzbach S.D.. Guest J.R. Biochim. Biophys. Acta 954:14-26(1988). 

2 Woods S.A., Miles J.S., Guest J R. FEMS Microbiol. Lett. 51:181-186(1988). 

3 ZalkinH Dixon J.E. Prog. Nucleic Acid Res. Mol.Btol. 42:259-287(1992). „. ^ •» 

1 rims' S.E Woolridge E.M.. Ransom S.C.. Landro J.A.. Babbitt P.C.. Kozarich J.W. Biochemistry 31. 

9768-9776(1992). 

?Sss"rt?b:%tiSs:s^^^^^^ 

2lT2^noird residues [1.2.3]. The latter shows some similarities [4] with that of various other families of DNA- 
dLnden° SaseT E Jkaryotes seem to possess a family of six proteins that contain this domain They were first 
identified in yl^^^^^^^ them have a direct role in the initiation of chromosomal DNA replication bV 'ntera<^'n9 

JiS w th autSnomously replicating sequences (ARS). They were thus called '"^"^XTI^T 
SSii?!eneTmbols pref^ed by MCM. These six proteins are: - MCM2. also known as cdc19 (m S.pombe) |E1J. 
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- MCM3, also known as DNA polymerase alpha holoenzyme-associated protein P1 , RLF beta subunitor ROA. - MCM4, 
also known as CDC54, cdc21 (in S.pombe) or dpa (in Drosophila). - MCM5. also known as CDC46 or nda4 (in S. 
pombe). - MCM6, also known as mis5 (in S.pombe). - MCM7. also known as CDC47 or Prolifera (in A.thaliana).This 
family is also present in archebacteria. In Methanococcus jannaschiithere are four members: MJ0363. MJ0961 , MJ1489 

5 and MJECL13.The presence of a putative ATP-binding domain implies that these proteins maybe involved in an ATP- 
consuming step in the initiation of DNA replication in eukaryotes. As a signature pattern, a perfectly conserved region 
was selected that represents a special version of the B motif found in ATP-binding proteins. 
[0989] Consensus pattern: G-[IVT]-[LVAC](2)-[IVT1-D-[DEHFL]-[DNST1 

10 1 1] Coxon A.. Maundrell K., Kearsey S.E. Nucleic Acids Res. 20:5571-5577(1992). 

[2] Hu B.. Burkhart R.. Schulte D.. Musahl C. Knippers R. Nucleic Acids Res. 21:5289-5293(1993). 

[ 3] Tye B.-K. Trends Cell Biol. 4:180-166(1994). 

[4] Koonin E.V. Nucleic Acids Res. 21:2541-2547(1993). 

15 [0990] 341. Macrophage migration inhibitory factor family signature (MIF) 

A protein called macrophage migration inhibitory factor (MIF) [1] seems to exert an important role in host inflammatory 
responses. It play a pivotal role in the host response to endotoxic shock and appears to serve as a pituitary "stress" 
hormone that regulates systemic inflammatory responses. MIF is a secreted protein of 115 residues which is not proc- 
essed from a larger precursor. D-dopachrome tautomerase [2] is a mammalian cytoplasmic enzyme involved in melanin 

20 biosynthesis and that tautomerizes D-dopachrome with concomitant decarboxylation to give 5,6-dihydroxyindole (DHl). 
It is a protein of 117 residues highly related to MIF. It must be noted that MIF binds glutathione and has been said to 
be related to glutathione S-transferases. This assertion has been later disproved [3].As a signature pattern for these 
proteins, a conserved region was selected located in the central section. 
[0991 ] Consensus pattem: [DE]-P-C-A-x(3)-[LIVM]-x-S-l-G-x-[LIVM]-G- 

25 

[ 1] Bucala R. Immunol. Lett. 43:23-26(1994). 

[2] Odh G.. Hindemith A.. Rosengren A.-M., Rosengren E.» Rorsman H. Biochem. Biophys. Res. Commun. 197: 
619-624(1993). 

[ 3] Pearson W.R. Protein Sci. 3:525-527(1994). 

30 

[0992] 342. MIP family signature 

Recently the sequence of a number of different proteins, that all seem to be transmembrane channel proteins, has 
been found to be highly related [1 to 4].These proteins are listed below. - Mammalian major intrinsic protein (MIP). MIP 
is the major component of lens fiber gap junctions. Gap junctions mediate direct exchange of ions and small molecule 

35 from one cell to another. - Mammalian aquaporins [5]. These proteins form water-specific channels that provide the 
plasma membranes of red cells and kidney proximal and collecting tubules with high permeability to water, thereby 
permitting water to move in the direction of an osmotic gradient. - Soybean nodulin-26, a major component of the 
peribacteroid membrane induced during nodulation in legume roots after Rhizobium infection. - Plants tonoplast intrinsic 
proteins (TIP). There are various isoforms of TIP: alpha (seed), gamma, Rt (root), and Wsi (water-stress induced). 

40 These proteins may allow the diffusion of water, amino acids and/or peptides from the tonoplast interior to the cytoplasm. 

- Bacterial glycerol facilitator protein (gene gIpF), which facilitates the movement of glycerol across the cytoplasmic 
membrane. - Salmonella typhimurium propanediol diffusion facilitator (gene pduF). - Yeast FPS1, a glycerol uptake/ 
efflux facilitator protein. - Drosophila neurogenic protein 'big brain* (bib). This protein may mediate intercellular com- 
munication; it may functions by allowing the transport of certain molecules(s) and thereby sending a signal for an 

45 exodermal cell to become an epidermoblast instead of a neuroblast. - Yeast hypothetical protein YFL054c. - A hypo- 
thetical protein from the pepX region of lactococcus lactis. The MIP family proteins seem to contain six transmembrane 
segments. Computer analysis shows that these protein probably arose by a tandem, intragenic duplication event from 
an ancestral protein that contained three transmembrane segments. As a signature pattern a well conserved region 
was selected which is located in a probable cytoplasmic loop between the second and third transmembrane regions. 

50 [0993] Consensus pattem: [HNQA]-x-N-P-[STA]-[LIVMF]-IST|-[LIVMF]-[GSTAFYl- 

[ 1] Reizer J., Reizer A., Saier M.H. Jr. CRC Crit. Rev. Biochem. 28:235-257(1993). 
[ 2) Baker M.E., Saier M.H. Jr. Cell 60:185-186(1990). 

[ 3] Pao G.M.. Wu L.-F.. Johnson K.D.. Hoefte H., Chrispeels M.J., Sweet G., Sandal N.N., Saier M.H. Jr. Mol. 
55 Microbiol. 5:33-37(1991). 

[ 4] Wistow G.J., Pisano M.M., Chepelinsky A.B. Trends Biochem. Sci. 16:170-171(1991). 
[ 5] Chrispeels M.J., Agre P. Trends Biochem. Sci. 19:421-425(1994). 
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[0994] 343. Mandelate racemase / muconate lactonizing enzyme |f;^i|y^^9^ bacterial en- 

.^..o f^n.^^9 9^ ^MR^ and muconate lactonizing enzyme(EC (MLE) are iwo ^^^^^ 

,v.a„uo.o.- '""^rrriT-Aatnhniicim Thev catalyze mechanistically distinct reactions yei iii«y a.c; .o.a.-v. 

zymes involved in aromatic acid c^tabolism. They caiaiyze r 21 A number of other proteins also seem 

the level of their primary, quaternary (homooctamer) and tej^^^^^^ Saimid^^^^^^^ chloromuconate cyclois- 

metal ion]- 

109971 344. Merozoite Surface Antigen 2 (MSA-2) family ■» , ,QQn.A'^-9H 920 

[o9981 Thomas AW. Carr DA, Carter JM, Lyon JA, Mol Blochem Parasrtol 1990.43.211-220. 



[1002] 

bers: 105 



[1003] 347. O-methyltransferase ^'^^'J'^'l^^fl^^^.^^^^^ ^hese enzymes utilise S-adenosyl methionine. 

k:;?] SnrmThirsLTavr^^^^^^ 

for photosynthetio growth. Ch,, and BchD have m^^^^^^^ 

riOOSl [1] Walker CJ, Willows RD. Biochem J 1997,327.321- J JJ. KJ reiers.eii , 
BM. Hunter CN, Henningsen KW, J Bacteriol 1998:180:699.704. 

[10091 349. Plasmid recombination enzyme (Mob.Pre) ^^^^^^ that i<5 indeoendent of RecA. In such 

protein is: also known as Mob (conjugative mobilization)^ 

[1011] [1] Priebe SD, Lacks SA. J Bacteriol 1989;171 :4778-4784. 

[1012] 350. Monooxygenase 

Z% ^STpZTZ^'^^^^S^ Ballou OP. Luawlg ML. Science 1994;2ae:110-114. 

Membe°,; of are found in proteasome regulatory subunits. eukaryotic inHiation factor 3 (elF3) sub- 

M^^^" m'/^l^rd^rmit^^^ 1998;7:1250-1254. [2] Hershey JW. Asano K. Naranda T. Vomlocher 

HP. Hanachi P. Merrick WC. Biochimie 1996;78:903-907. 

[1(il8] 352. Myc ^-^'^O"*^""'"^';^'"" <*;]?7^^^ leucine zipper class of transcription factors, see HLK 

iloT [!rFaS LM. Penn LZ. FASEB J 1998=12:633^51. [2] Grandori C. Eisenman RN. Trends Biochem Sci 



148 



EP1 033 405 A2 



1997;22:177-181. 

[1 021 ] 353. (Metailothio_2) Metailothionein. Members of this family are metallothioneins. These proteins are cysteine 
rich proteins that bind to heavy metals. Members of this family appear to be closest to Class II metallothioneins, seed 
metalthio. Number of members: 55 
5 [1022] [1] Medline: 98267202. Characterization of gene repertoires at mature stage of citrus fruits through random 
sequencing and analysis of redundant metallothionein-like genes expressed during fruit development. Moriguchi T. 
Kita M, Hisada S, Endo-lnagaki T, Omura M; Gene 1998;211:221-227. 
[1023] 354. MAGE family 

[1024] The MAGE (melanoma antigen-encoding gene) family are expressed in a wide variety of tumors but not in 
10 normal cells, with the exception of the male germ cells, placenta, arid, possibly, cells of the developing embryo. The 
cellular function of this family is unknown. 

[1025] [1] McCurdy DK. Tai LQ. Nguyen J. Wang Z, Yang HM, Udar N, Naiem F, Concannon P, Gatti RA; Mol Genet 
Metab 1998;63:3-13. 

[1026] 355. Malic enzymes signature. Malic enzymes, or malate oxidoreductases, catalyze the oxidative decarbox- 

15 ylation of malate into pyruvate important for a wide range of metabolic pathways. There are three related forms of malic 
enzyme (1,2,3]: - NAD-dependent malic enzyme (EC 1.1.1.38) . which uses preferentially NAD and has the ability to 
decarboxylate oxaloacetate (OAA). It is found in bacteria and insects. - NAD-dependent malic enzyme (EC 1.1.1.39 ). 
which uses preferentially NAD and is unable to decarboxylate OAA. It is found in the mitochondrial matrix of plants 
and is a heterodimer of highly related subunits. - NADP-dependent malic enzyme (EC 1.1.1.40) , which has a preference 

20 for NADP and has the ability to decarboxylate OAA. This form has been found in fungi, animals and plants. In mammals, 
there are two isozymes: one, mitochondrial and the other, cytosolic. Plants also have two isozymes: chloroplastic and 
cytosolic. There are two other proteins which are closely structurally related to malicenzymes: - Escherichia coli protein 
sfcA, whose function is not yet known but which could be an NAD or NADP-dependent malic enzyme. - Yeast hypo- 
thetical protein YKL029c, a probable malic enzyme. There are three well conserved regions in the enzyme sequences. 

25 Two of them seem to be involved in binding NAD or NADP. The significance of the third one, located in the central part 
of the enzymes, is not yet known. This region has been developed as a signature pattern for these enzymes. 
[1 027] Consensus pattern: F-x-[DVJ-D-x(2)-G-T-[GSA]-x-[lV]-x-[LIVMA]-[GAST](2)-[LIVMF]{2)- 
[1028] [ 1] Artus N.N., Edwards G.E. FEBS Lett. 182:225-233(1 985).[ 2] Loeber G.. Infante A.A., Maurer-Fogy 1.. 
Krystek E., Dworicin M.B. J. Biol. Chem. 266:3016-3021(1991). [ 3] Long J.J., Wang J.-L., Berry J.O. J. Biol. Chem. 

30 269:2827-2833( 1 994). 
[1029] 356. (matrixin) 

Matrixins cysteine switch (aka peptidase_M10) 

[1030] Mammalian extracellular matrix metalloproteinases (EC 3.4.24.-). also known as matrixins [1] (see 
<PDOC00129>), are zinc-dependent enzymes. They are secreted by cells in an inactive form (zymogen) that differs 
35 from the mature enzyme by the presence of an N-terminal propeptide. A highly conserved octapeptide is found two 
residues downstream of the C-terminal end of the propeptide. This region has been shown to be involved in autoinhi- 
bition of matrixins [2,3]; a cysteine within the octapeptide chelates the active site zinc ion. thus inhibiting the enzyme. 
This region has been called the 'cysteine switch' or 'autoinhibitor region'. 
[1031] A cysteine switch has been found in the following zinc proteases: 

40 

MMP-1 (EC 3.4.24.7) (interstitial collagenase). 

- MMP-2 (EC 3.4.24.24) (72 Kd gelatinase). 

- MMP-3 (EC 3.4.24.17) (stromelysin-1). 

- MMP-7 (EC 3.4.24.23) (matrilysin). 

45 - MMP-8 (EC 3.4.24.34) (neutrophil collagenase). 

- MMP-9 (EC 3.4.24.35) (92 Kd gelatinase). 

- MMP-10 (EC 3.4.24.22) (stromelysin-2). 

- MMP-11 (EC 3.4.24.-) (stromelysin-3). 

MMP-1 2 (EC 3.4.24.65) (macrophage metalloelastase). 
50 - MMP-1 3 (EC 3.4.24.-) (collagenase 3). 

MMP 14 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 1). 

MMP-1 5 (EC 3.4,24.-) (membrane-type matrix metalliproteinase 2). 

MMP-1 6 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 3). 

Sea urchin hatching enzyme (EC 3.4.24.12) (envelysin) [4]. 
55 . Chlamydomonas reinhardtii gamete lytic enzyme (GLE) [5]. 

[1032] Consensus pattemP-R-C-[GN]-x-P-[DR]-[LlVSAPKQ] [C chelates the zinc ion] Sequences known to belong 
to this class detected by the pattem ALL, except for cat MMP-7 and mouse MMP-11. 
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[ 1? :''T±^':'-r:^S^l M.trisianL.M.. Breathnach R. J. Bio.. Cham. 263:11892-11899 

ri0331 357 Vertebrate metallothioneins signature (metalthio) ^oHmi,.m nickel etc 

includes mammalian MPs as well as MPs from ^"f which display 

aassngroups together MTsfrom various speaessucha^^^^^^^^^^ 

none or only very distant correspondence to class I MT s. ^'^^^ J^J^/"^^^ 20 of these residues are 

[10341 Consensus pattern: C-x-C-[GSTAP]-x(2)-C-x-C-x(2)-C-x-C-x(2)-C-x-K. 

1 11 Hamer D.H. Annu. Rev. Biochem. 55:913-951(1986). 

[2] Kagi J.H.R.. Schaffer A. Biochemistry 27:8509-8515(1988). 

[ 3] Binz P.-A. Thesis. 1996, University of Zurich. 

„, unknown ft,n*n ™»9„i»d W ^ ^^^^s . JSTdr^, sp.c d.U in m. .« 

l„ ,h. mltochonanon. ■ ye»l mUoonondrla, FAD ■^Z^T^I^X'^^^^^'^'^- ' ^"^ 

wa«; selected located just after the first transmembrane region. 
Soaq^S'sLnTrpatt^^^ 

[ 11 Klingenberg M. Trends Biochem. Set. 15:108-112(1990). 

1 21 Walker J E. Curr. Opin. Struct. Biol. 2:519-526(1992). 

[ ?! K^n J. Saier M.H. Jr. CRC Crit. Rev. Biochem. 28:209-233(1993). 

[4)KuanJ.,SaierM.H. Jr. Res. Microbiol. 144:671-672(19M) o,n-i 159 1170(1993) 

[ SI Nelson D.R., Lawson J.E.. Klingenberg M.. Douglas M.G. J. Mol. Bk>I. 230.1159-1170(1993). 
[6] PalmieriF.FEBS Lett. 346:48-54(1994). ^ ^ c« i« A97-42aM9931 

[71 Jank B.. Habermann B.. Schweyen R.J.. Link T.A. Trends Biochem. Sc.. 18.427-428(1993). 



150 



EP 1 033 405 A2 



[1037] 359. Prokaryotic molybdopterin oxidoreductases signatures (molybdopterin) 

A number of different prokaryotic oxidoreductases that require and bind amolybdopterin cofactor have been shown 
[1,2,3] to share a number of regions of sequence similarity. These enzymes are: - Escherichia coli respiratory nitrate 
reductase (EC 1.7.99.4) . This enzyme complex allows the bacteria to use nitrate as an electron acceptor during anaer- 

5 obic growth. The enzyme is composed of three different chains: alpha, beta and gamma. The alpha chain (gene narG) 
is the molybdopterin-binding subunit. Escherichia coli encodes for a second, closely related, nitrate reductase complex 
which also contains a molybdopterin-binding alpha chain (gene narZ). - Escherichia coll anaerobic dimethyl sulfoxide 
reductase (DMSO reductase). DMSO reductase Is the terminal reductase during anaerobic growth on various sulfoxide 
and N-oxide compounds. DMSO reductase Is composed of three chains: A, B and C. The A chain (gene dmsA) binds 

10 molybdopterin. - Escherichia coll blotin sulfoxide reductases (genes bisC and bisZ). This enzyme reduces a sponta- 
neous oxidation product of biotin, BDS, back to biotin. It may serve as a scavenger, allowing the cell to use biotin 
sulfoxide as a biotin source. - Methanobacterium formicicum fomnate dehydrogenase (EC 1.2.1.2 ). The alpha chain 
(gene fdhA) of this dimeric enzyme binds a molybdopterin cofactor. - Escherichia coll fomnate dehydrogenases -H 
(gene fdhF), -N (gene fdnG) and -O (gene fdoG). These enzymes are responsible for the oxidation of formate to carbon 

15 dioxide. In addition to molybdopterin, the alpha (catalytic) subunit also contains an active site, selenocysteine. - Wo- 
llnella succlnogenes poiysulftde reductase chain. This enzyme is a component of the phosphorylative electron transport 
system with polysulflde as the terminal acceptor. It is composed of three chains: A, B and C. The A chain (gene psrA) 
binds molybdopterin. - Salmonella typhimurium thiosulfate reductase (gene phsA). - Escherichia coli trimethylamine- 
N-oxtde reductase (EC 1.6.6.9) (gene torA) [4]. - Nitrate reductase (EC 1.7.99.4) from Klebsiella pneumoniae (gene 

20 nasA). Alcaligenes eutrophus. Escherichia coli, Rhodobacter sphaeroides, Thiosphaera pantotropha (gene napA), and 
Synechococcus PCC 7942 (gene narB).These proteins range from 715 amino acids (fdhF) to 1246 amino acids (narZ) 
inslze. Three signature pattems for these enzymes were derived. The first is based on a conserved region In the N- 
termlnal section and contains two cysteine residues perhaps involved in binding the molybdopterin cofactor. It should 
be noted that this region Is not present in bisC. The second pattem is derived from a conserved region located in the 

25 central part of these enzymes. 

[1 038] Consensus pattem: [STAN]-x-[CH]-x(2,3)-C-[STAG]-[GSTVMFl-x-C-x-[LIVMFYW]-x-[LIVMA]-x(3,4)-[DEN- 
QKHTJ- 

Consensus pattem: [STA]-x-[STAC](2)-x(2)-[STAl-D-[LIVMY](2)-L-P-x-[STACl(2)-x(2)-E- 

Consensus pattem: A-x(3)-[GDTl-l-x-[DNQTK]-x-[DEA]-x-[LIVM]-x-[LIVMC]-x- [NS]-x(2)-[GS]-x(5).A-x-[LIVMl-[ST]- 

30 

[ 1] Wootton J.C., NIcolson R.E., Cock J.M., Walters D.E., Burke J.F., Doyle W.A., Bray R.C. Biochim. Biophys. 
Acta 1057:157-185(1991). 

[2] Bilous P.T., Cole S.T., Anderson W.F., Weiner J.H. Mol. Microbiol. 2:785-795(1988). 
[ 3] Trieber C.A.. Rothery R.A., Weiner J.H. J. Biol. Chem. 269:7103-7109(1994). 
35 1 4] Mejean v., Lobbi-Nivol C, Lepelletier M., Giordano G.. Chippaux M., Pascal M.^. Mol. Microbiol. 11 :1 169-1 179 

(1994). 

[1039] 360. Bacterial mutT domain signature 

The bacterial mutT protein Is involved In the GO system [1] responsible for removing an oxidatively damaged form of 
^ guanine (8-hydroxyguanlne or7,8-dihydro-8-oxoguanlne) from DNA and the nucleotide pool. 8-oxo-dGTP is Inserted 
opposite to dA and dC residues of template DNA with almost equal efficiency thus leading to A.T to G.C transversions. 
MutT specifically degrades 8-oxo-dGTP to the monophosphate with the concomitant release of pyrophosphate. MutT 
is a small protein of about 12 to 15 Kd. It has been shown [2,3] that a region of about 40 amino acid residues, which 
is found In the N-terminal part of mutT. can also be found in a variety of other prokaryotic, viral, and eukaryotic proteins. 
45 These proteins are: 

Streptomyces pneumoniae mutX. 

A mutT homolog from plasmid pSAM2 of Streptomyces ambofaciens. 
Bartonella bacllliformis invasion protein A (gene invA). 
50 - Escherichia coli dATP pyrophosphohydrolase. 

Protein D250 from African swine fever viruses. 

Proteins D9 and D10 from a variety of poxviruses. 

Mammalian 7,8-dihydro-8-oxoguanlne triphosphatase (EC 3.1.6.-) [4]. 

Mammalian diadenoslne 5\5"'-P1 ,P4-tetra phosphate asymmetrical hydrolase (Ap4Aase) (EC 3.6.1.17) [5], which 
55 cleaves A-5'-PPPP-5'A to yield AMP and ATP. 

A protein encoded on the antisense RNA of the basic fibroblast growth factor gene in higher vertebrates. 
Yeast protein YSA1 . 

Escherichia coli hypothetical protein yfaO. 
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tSUI lt;i ici Ilea uvjii I r- 



Esui it;i •»-«' '»5 ^^'» . . 

Escherichia coli hypothetical protein yrfE. 
Bacillus subtilis hypothetical protein yqkG. 
Bacillus subtilis hypothetical protein yzgD. 
Yeast hypothetical protein YGL067w. 



rcSr^C^nrnrus pattern: G-x(5)-E.x(4HSTAGC^ILIV^/U^CH^^^^^^^ 
[1] Michaels M.L., Miller J.H. J. Bacteriol. 174:6321-6325(1992). 

12) Koonin E.V. Nucleic Acids ^^^^ 21:4847-4847(1993)^ ir.323-330(1994). 

S "^^^ t;^^:^:^ S.. Ma. H., SeKi9UC. M. . Biol. Che.. 268: 

S?:rSan..^ 
(1995). 

11042] 361. Myb DNA-binding f^^'f;":^^^; ,^yb. encodenuclear DNA-bir,ding proteins that specifi- 

The retroviral oncogene ^--^V'' '^"^ J ^^^'^l ThTSamiralso includes the following proteins: - Drosophila D- 
cally recognize the sequence YAAC(G/T)G l^IJ^^^ r^i . Maize CI protein, a trans-acting factor which controls 
myb [2]. - vertebrate myb-like proteins A-myb and B-n^yb [3]. ^^'f ^^J^^^^^^ a transacting factor which regulates 
theexpressionofgenes involved in anthocyan.nb.osynthes^^Ma.^^^^^^^^^ 

the biosynthetic pathway of a flavonoid^erived pigment n «^rta.n floral ^^^'^^^ ^^ ^ywcl-related proteins in maize 
required for the initiation of '^'^--Jf ° J^^I?^ mTtSnscrip^o^^^^^^^^^ for the H.S4 gene. - Yeast 
and barley, whose roles are not yet known [4]. Yeast ^'^''^^J^^ ' J: . r.^^^ transcription, as well as upstream 
REB1 [8], which recognizes sites within both the enhancer actor whose activity 

Of mariy genes transcribed by RNA po.yme^ ' ^ ^^^^So^ ^ t m^^S^-^^^^^^^ ^^th and 
is required for cell cycle progression and g™^'^^"""^ "^f "^^^'^'^"^^^ regions in all of these proteins is a domain 
function. - Yeast hypothetical protein YMR21 3w.One of the most '^^ ^ion has been 

of leoamlno acids. It consists of three tandem ^^P«^*^ °^f;.;f/J7J"°3^'^f^i3sing'i„ retroviral v-myb sequences 



xxxxxxxxxWxxxEDxxxxxxxxxxxxxxWxxIxxxxxxRxxxxxxK 
»»*♦♦**♦***♦»****»******'** : Position of the patterns. 



[1043] Consensus pattern: W-[STl-x(2)-E-lDEl-x(2)-[LIV]- 
Consensus pattem: W-x(2)-[Lll-[SAG]-x(4,5)-R-x(8)-IYV\n-x(3)-lUVMj- 

the second of the two 

Note- this pattern detects the three copies of the domain in myb, d-myo, ^-^"^ 
complete copies of plant myb-related proteins, and the last two copies of yeast BAS1 

1 1] Biednkapp H.. Borgmeyer U., Sippel A.E Klempnauer K.H. Nature 335:«3J837(1988V 

irNrr^N:^r^^^^^^^^ 

J^Orro^^ " Aihma P.. Peterson T. Proc. Na«. Aca. Sc. -^^^^^^^^^^^^^ 

i^irrr^^^ik^rp^j-^^^^^ 

;rT.2-BaSin K.. Fink G.R., Amdt K.T. Science 246.931-935(1989). 
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1 8] Ju Q.. Morrow B.E., Warner J.R. Mol. Cell. BioL 10:5226-5234(1990). 
1 9] Klempnauer K.-H.. Sippel A.E. EMBO J. 6:2719-2725(1987). 



[1044] 362. NAD-dependent glycerol-3-phosphate dehydrogenase signature 
5 NAD-dependent glycerol-3-phosphate dehydrogenase (EC 1.1.1.8 ) (GPD) catalyzes the reversible reduction of dihy- 
droxyacetone phosphate to glycerol-3- phosphate. It is a eukaryotic cytosolic homodimeric protein of about 40 Kd, As 
a signature pattern a glycine-rich region that is probably [1] involved in NAD-binding was selected. 
[1045] Consensus pattern: G-[ATl-[LIVM]-K-[DN]-[LIVM](2)-A-x-[GA)-x-G-[LIVMF]-x- [DE)-G-[LIVM]-x-[LIVMFYW]- 
G-x-N- 

10 [1046] ( 1] Otto J.. Argos P.. Rossmann M.G. Eur. J. Biochem. 109:325-330(1980). 
[1047] 363. Nucleosome assembly protein (NAP) 

[1048] It is thought that NAPs may be involved in regulating gene expression as a result of histone accessibility [1]. 

[1] Rodriguez P, Munroe D, Prawitt D, Chu LL. Brie E, Kim J, Reid LH, Davies C, Nakagama H, Loebbert R, 
IS Winterpacht A, Petruzzi MJ. Higgins MJ, Nowak N, Evans G. Shows T. Weissman BE, Zabel B, Housman DE, 

Pelletier J, Genomics 1997;44:253-265. 

[2] Schnieders F. Dork T. Arnemann J. Vogel T, Werner M. Schmidtke J; Hum Mol Genet 1996;5:1801-1807. 

[1049] 364. NB-ARC domain 
20 van der Biezen EA, Jones JD, Curr Biol 1998;8:226-227. 

[1050] 365. Nucleoside diphosphate kinases active site 

[1051] Nucleoside diphosphate kinases (EC 2.7.4.6 ) (NDK) [1] are enzymes required for the synthesis of nucleoside 
triphosphates (NTP) other than ATP. They provide NTPs for nucleic acid synthesis, CTP for lipid synthesis, DTP for 
polysaccharide synthesis and GTP for protein elongation, signal transduction and microtubule polymerization. In eu- 

25 karyotes, there seems to be a small family of NDK isozymes each of which acts in a different subcellular compartment 
and/or has a distinct biological function. Eukaryotic NDK isozymes are hexamers of two highly related chains (Aand 
B) [2]. By random association (A6, A5B...AB5, B6). these two kinds of chain form isoenzymes differing in their isoelectric 
point. NDK are proteins of 17 Kd that act via a ping-pong mechanism in which a histidine residue is phosphorylated. 
by transfer of the terminal phosphate group from ATP. In the presence of magnesium, the phosphoenzyme can transfer 

30 its phosphate group to any NOP, to produce an NTPNDK isozymes have been sequenced from prokaryotic and eu- 
karyotic sources. It has also been shown [3] that the Drosophila awd (abnormal wing discs) protein, is a microtubule- 
associated NDK. Mammalian NDK is also known as metastasis inhibition factor nm23.The sequence of NDK has been 
highly conserved through evolution. There is a single histidine residue conserved in all known NDK isozymes, which 
is involved in the catalytic mechanism [2]. Our signature pattern contains this residue. 

35. [1052] Consensus pattem: N-x(2)-H-[GA]-S-D-[SA]-[LIVMPKNE] [H is the putative active site residue]- 

[ 1] Parks R., Agarwal R. (In) The Enzymes (3rd edition) 8:307-334(1973). 

[ 2) Gilles A.-M., Presecan E., Vonica A., Lascu I. J. Biol. Chem. 266:8784-8789(1991). 

[ 3] Biggs J., Hersperger E., Steeg P.S., Liotta L.A., Shearn A. Cell 63:933-940(1990). 

40 

[1053] 366. Nitrite and sulfite reductases iron-sulfur/siroheme-binding site (NIR.SIR) Nitrite reductases (NiR) [1] 
catalyze the reduction of nitrite into ammonium, the second step in the assimilation of nitrate. There are two types of 
NiR: the higher plant chloroplastic form of NiR (EC 1.7.7.1 ) is a monomeric protein that uses reduced fen-edoxin as 
the electron donor; while fungal and bacterial NiR (EC 1.6.6.4 ) are homodimeric proteins that uses NAD(P)H as the 

45 electron donor. Both fonms of NiR contain a siroheme-Fe and iron-sulfur centers. Sulfite reductase (NADPH) (EC 
1.8.1.2 ) (SIR) [2] is the bacterial enzyme that catalyzes the reduction of sulfite to sulfide. SIR is an oligomeric enzyme 
with a subunit composition of alpha(8)-beta(4), the alpha component is a flavoprotein (SIR-FP), while the beta com- 
ponent is a siroheme, iron-sulfurprotein (SIR-HP).Sulfite reductase (ferredoxin) (EC 1.8.7.1 ) [3] is a cyanobacterial 
and plant monomeric enzyme that also catalyzes the reduction of sulfite to sulfide. Anaerobic sulfite reductase (EC 

50 1.8.1.-) (ASR) [4], a bacterial enzyme that catalyzes the NADH-dependent reduction of sulfite to sulfide. ASR is an 
oligomeric enzyme composed of three different subunits. The C component (geneasrC) seems to be a siroheme, iron- 
sulfur protein. These enzymes share a region of sequence similarity in their C-terminal half; this region which spans 
about 80 amino acids includes four conserved cysteine residues. Two of the Cys are grouped together at the beginning 
of the domain, and the two others are grouped in the middle of the domain. The cysteines are involved in the binding 

55 of the iron-sulfur center; the last one also binds the siroheme group [2]. A signature pattem from the region around the 
second cluster of cysteines was derived. 

[1 054] Consensus pattern: [STV]-G-C-x(3)-C-x(6)-[DE]-[LIVMFHGAT|-[LIVMF] [The two Cs are ison-sulfur ligandsj- 
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r 11 Campbell W.H., Kinghom J.R. Trends Biochem. Sci. 15:315-319(1990). 
. ^ . ^-jg-el L M Gctzcff E.D. Science ^70:59-67(1995). 

1 3] Gisselman'n Z Klausmeierp7sc^^^ J-D. Biochim. Biophys. Acta 1144:102-106(1993). 
[41 Huang C.J.. Barrett E.L. J. Bacteriol. 173:1544-1553(1991). 

ZZ:^:^X^T:^^^er |ySrSio«c ana v J proteinl N.t Is a monon^eric protein of about 50 to 
eO Kd vSe siuence appears to be well conserved. T^^ 

patterns. The first one is located in the central section, the second in the C-terminal part. 
[1056] Consensus pattern: E-l-N-F-L-C-x-H-K- 

Consensus pattern: K-F-G-x-G-D-G- , , .j rr „i c7.'j7K_a'*nMQq'^^ 

[1057] [ 1] Rudnick D.A.. McWherter C.A.. Gokel G.W.. Gordon J.I. Adv. Enzymol. 67.375-430(1993). 
11058] 368. ADP-glucose pyrophosphorylase signatures (NTP_transferase) o , 7 otx catalvzes a 

M0591 ADP-Qlucose Dvrophosphorylase (glucose-1 -phosphate adenylyltransferase) [1,2](EC 2.7.7.27) catalyzes a 
lery!mpornSe; n m^^^^^^^^ of alpha 1 .4-glucans (glycogen or starch) in bacteria and P'-J-^*-- <f 

the aSed glucosyl donor. ADP-glucose. from glucose-1 -phosphate and ATP.ADP-g ucose ^y'^f'^^^^'^'^'!^; 
eframrc allosterically regulated enzyme. It is a homotetramer in bacteria while in plant chlorop asts and amyloplasts 
t Ta hTte^c^SramTr So different yet evolutionary related, subunits. There are a number of conserved regions .n 
he sequent Abacterial and plant ADP^Iucose pyrophosphorylase subuni^. -^f^^ ^^'^^^^^^^^^^ 
as signature patterns. The first two are N-terminal and have been proposed to be part of the ^^^^^^'^^'^l^'^'^l^}^^ 
bS sites'n the Escherichia coli enzyme (gene gIgC). The third pattern corresponds to a conserved region .n the 

central part of the enzymes. o » ri\/i 

[1060] Consensus pattern: [AG]-G-G-x-G-[STK]-x-L-x(2)-L-[TA]-x(3)-A-x-P-A-[LV]- 
Consensus pattern: W-lFY]-x-G-[S-n-A-[DNSH]-[ASHLIVMFYVy]- 
Consensus pattern: [APV]-[GS]-M-G-[LIVMN]-Y-[IVCl-[LIVMFYl-x(2)-[DENPHK]- 

[ 1] Nakata P.A.. Greene TW.. Anderson J.M.. Smith-White B.J.. OkitaT.W.. Preiss J. Plant Mol. Biol. 17:1089-1093 

1 2] Preiss J.. Ball K.. Hutney J.. Smith-White B.J.. Li. L.. Okitsa T.W. Pure Appl. Chem. 63:535-544(1991). 

KSSl ^:ir:^C:i7rr^rX:S:2^l n^amtalning the pH of actWely metabol^ng cells. The molecu.r 

Thte'arp^.S^t'^S^^^^^^^^^ regions (M) at the amino-tenninus and a large cytop-a-^r^^^^^^^ 

thVrlrboxvTterminus The transmembrane regions M3-M12 share identity with other members of the family. The M6 
Tnd M7 SnTare^^^^^^^^^^ Thus, t^s is thought to be the region that is involved in the transport of sodium 

and hydrogen ions. The cytoplasmic region has little similarity throughout the family. iq97.272- 
[lOesI [1] Dibrov P. Fliegel L; FEBS Lett 1998;424:1-5. [2] O-^^^^"^' J- f 
22373-22376.[3] Numata M, Petrecca K. Lake N, Orlowski J; J Biol Chem 1998;273:6951-6959. 
M0641 370 Sodium:sulfate symporter family Signature (Na_sulph_symp) 

me?il membrane proteins that mediate the intake of a wide variety of molecules with the concomitant uptake o 
Lodium ions (sS^um symporters) canbe grouped, on the basis of sequence and functional similarities into a nurnber 
rdisITfamits oTof these famlli^^ currently consists of the following proteins: - Mammalian sodiumfeufete 
cLransiSrteMIl - Mammalian renal sodium/dicarboxylate cotransporter [2]. which transports succinate and c trate^- 
SmaCinllLnal sodium/dicarboxylate cotransp^ 

sponse regulator SAC1 [3]. - Caenorhabditis elegans hypothetK:al proteins ^O^W^' ^;r„^-2- "1°^ ^"^^^^^^^ 

- Escherichia coli hypothetical protein yfbS. - Haemophilus influenzae •^VP^t^'^*^ .P"^^" 

strain PCC 6803 hypothetical protein sll0640. - Methanococcus jannaschii hypothetical protein MJ0672.These frans 

oSers S proteins of from 430 to 620 amino acids which are highly hydrophobic and which probably contain aboirt 

iftrnerTeEre -iSis. As a signature pattern, a conserved region was selected which is located in or near the 

penultimate transmembrane region. „ v/ 

[1065] Consensus pattern: [STACP)-S-x(2)-F-x(2)-P-[LIVM]-[GSA]-x(3)-N-x-[LIVM]-V- 

[ 1] Markovich D.. Forgo J.. Stange G.. Biber J.. Murer H. Proc. Natl. Acad. Sci. U.SA. 90:8073-8077(1993). 

[ 2] Paior A.M. Am. J. Physiol. 270:642-648(1996). 

[ 3] Davies J.P.. Yildiz F.H.. Grossman A. EMBO J. 15:2150-2159(1996). 
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[1066] 371. NifU-like domain 

[1 067] This is an alignment of the carboxy-tenninal domain. This is the only common region between the NifU protein 
from nitrogen-fixing bacteria and rhodobacterial species. The biochemical function of NifU is unknown [1]. 
[1068] Ouzounis C. Bork P. Sander C, Trends Biochem Sci 1994;19:199-200. 

5 [1069] 372. Nitrilases / cyanide hydratase signatures 

Nitrilases (EC 3.5.5.1 ) are enzymes that convert nitriles into their con'esponding acids and ammonia. They are wide- 
spread in microbes as well as in plants where they convert indoIe-3-acetonitrile to the hormone indole-3-acetic acid. 
A conserved cysteine has been shown [1 ,2] to be essential for enzyme activity: it seems to be involved in a nucleophilic 
attack on the nitrile carbon atom. Cyanide hydratase (EC 4.2.1.66) converts HCN to formamide. In phytopathogenic 

10 fungi, it is used to avoid the toxic effect of cyanide released by wounded plants [3]. The sequence of cyanide hydrolase 
is evolutionary related to that of nitrilases. Yeast hypothetical proteins YIL164c and YIL165c also belong to this family. 
As signature patterns for these enzymes, two conserved regions were selected. The first is located in the N-terminal 
section while the second, which contains the active site cysteine, is located in the central section. 
(1 070] Consensus pattem: G-x(2)-[LIVMFY](2)-x-[IF]-x-E-x(2)-[LIVM]-x-G-Y-P- 

15 Consensus pattem: G-[GAQ]-x(2)-C-[WA]-E-[NHl-x(2)-[PST]-[LIVMFYS]-x-[KR] [C is the active site residue]- 

[ 1] Kobayashi M.. Izui H., Nagasawa T, Yamada H. Proc. Natl, Acad. Sci. U.S.A. 90:247-251(1993). 

[ 2] Kobayashi M.. Komeda H., Yanaka N., Nagasawa T.. Yamada H. J. Biol. Chem. 267:20746-20751(1992). 

[3] Wang P., Vanetten H.D. Biochem. Biophys. Res. Commun. 187:1048-1054(1992). 

20 

[1071] 373. NusB family 

[1072] The NusB protein is involved in the regulation of rRNA biosynthesis by transcriptional antitermination. 
[1073] Huenges M, Rolz C, Gschwind R, Peteranderi R, Berglechner F, Richter G, Bacher A, Kessler H.Gemmecker 
G. EMBO J 1998;17:4092-4100. 

25 [1074] 374. (Neur Chan) Neurotransmitter-gated ion-channels signature 

Neurotransmitter-gated ion-channels [1 ,2.3.4] provide the molecular basis for rapid signal transmission at chemical 
synapses. They are post-synapticoligomeric transmembrane complexes that transiently form a ionic channel upon the 
binding of a specific neurotransmitter. Presently, the sequence of subunits from five types of neurotransmitter-gated 
receptors are known: - The nicotinic acetylcholine receptor (AchR), an excitatory cation channel. In the motor endplates 

30 of vertebrates, it is composed of four different subunits (alpha, beta, gamma and delta or epsilon) with a molar stoichi- 
ometry of 2:1:1:1. In neurones, the AchR receptor is composed of two different types of subunits: alpha and non-alpha 
(also called beta). Nicotinic AchRs are also found in invertebrates. - The glycine receptor, an inhibitory chloride ion 
channel. The glycine receptor is a pentamer composed of two different subunits (alpha and beta). - The gamma- 
aminobutyric-acid (GABA) receptor, which is also an inhibitory chloride ion channel. The quaternary structure of the 

35 GABA receptor is complex; at least four classes of subunits are known to exist (alpha, beta, gamma, and delta) and 
there are many variants in each class (for example: six variants of the alpha class have already been sequenced). - 
The serotonin 5HT3 receptor. Serotonin is a biogenic hormone that functions as a neurotransmitter, a hormone and a 
mitogen. There are seven major groups of serotonin receptors; six of these groups (5HT1, 5HT2, and 5HT4 to 5HT7) 
transduce extracellular signal by activating G proteins, while 5HT3 is a ligand-gated cation-specific ion channel which. 

40 when activated causes fast, depolarizing responses in neurons, - The glutamate receptor, an excitatory cation channel. 
Glutamate is the main excitatory neurotransmitter in the brain. At least three different types of glutamate receptors 
have been described and are named according to their selective agonists (kainate, N-methyl-D-aspartate (NMDA) and 
quisqualate).AII known sequences of subunits from neurotransmitter-gated ion-channels are structurally related. They 
are composed of a large extracellular glycosylated N-terminal ligand-binding domain, followed by three hydrophobic 

45 transmembrane regions which form the ionic channel, followed by an intracellular region of variable length. A fourth 
hydrophobic region is found at the C-terminal of the sequence. The sequence of subunits from the AchR, GABA, 5HT3, 
and Gly receptors are cleariy evolutionary related and share many regions of sequence similarities. These sequence 
similarities are either absent or very weak in the Glu receptors. In the N-terminal extracellular domain of AchR/GABA/ 
5HT3/Gly receptors, there are two conserved cysteine residues, which, in AchR, have been shown to form a disulfide 

50 bond essential to the tertiary structure of the receptor. A number of amino acids between the two disulfide-bonded 
cysteines are also conserved. Therefore this region was used as a signature pattern for this subclass of proteins. 
[1075] Consensus pattem: C-x-[LIVMFQ]-x-[LIVMF]-x(2)-[FY]-P-x-D-x(3)-C [The two C*s are linked by a disulfide 
bond]- 

55 [ 1] Stroud R.M.. McCarthy M.P., Shuster M. Biochemistry 29:11009-11023(1990). 

[ 2] Betz H. Neuron 5:383-392(1990). 

[ 3] Dingledlne R.. Myers S.J.. Nicholas R.A. FASEB J. 4:2632-2645(1990). 
[4] Barnard E.A. Trends Biochem. Sci. 17:368-374(1992). 
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[10761 375. Orotidine 5'-phosphate decarboxylase active site . . ^ k:^ox,« 

L .... . ^. _....u.*« ^ w 1^,.^ /cn >i i 1 /owPrtPcasG^ ri.21 catalvzes the last step in the de novo biosyn- 

thesis of pynmidines. the decarboxylation of OMP into UMP. in higher eukaryotes OMPdecase .s part, wrth orotat^ 
phosphonbosyltransferase, of a bifunctional enzyme, while the prokaryotic and fungal OlVIPdecases are monofunctional 
protein. Some parts of the sequence of OMPdecase are well conserved across species. The best conserved region .s 
located in the N-tenninal half of OMPdecases and is centered around a lysine residue which is essential for the catalytic 
function of the enzyme. This region has been developed as a signature pattern. 

S??? Consensus pattern: ILIVMFTA]-ILIVMF]-x-D-x-K-x{2)-D-l-[GPl-x-T-[UVMTA] [K is the aclwe site residue]- 

[ 1] Jacquet M., Guilbaud R., Garreau H. Mol. Gen. Genet. 211:441-445(1988). 
[ 2] Kimsey H.H., Kaiser D. J. Biol. Chem. 267:819-824(1992). 

[10781 376. ATP synthase delta (OSCP) subunit signature 

ATP synthase (proton-translocating ATPase) (EC 3.6.1.34) [1,2J is a component of the cytoplasmic membrane of eu- 
bactena. the inner membrane of mitochondria, and the thylakoid membrane of chloroplasts. The ATPase complex is 
composed of an oligomeric transmembrane sector, called CF(0), which acts as a proton channel, and a catalytic core, 

*o!!Jo1li^L"SSte of ttie ATPase complex, known as subunit delta in bacteria and chloroplaste or the Oligomyci^^ 
Sensitivity Conferral Protein (OSCP) in mitochondria, seems to be part of the stalk that links CF(0) to CF(1). It either 
transmits conformatkjnal changes from CF(0) into CF(1 ) or is involved in proton conduction [3]. 
The different delta/OSCP subunits are proteins of approximately 200 amino-acid residues - once the transit peptide 
has been removed in the chloroplast and mitochondrial fomw - which show only moderate sequence homology 
The signature pattern used to detect ATPase delta/OSCP subunits is based on a conserved region in the C-terminal 

^?79ryo^"eTuf "pattern: [LIVM]-x-[L1V1V1FYT1-x(3)-[LIVMT]-[DENQK]-x(2)-[UVM]-x-^^^^^^ 
[LIVM]-[KRHENQ]-x-[GSEN] 

[ 1] Futai M., Noumi T. Maeda M. Annu. Rev. Biochem. 58:111-136(1989). 
( 2] Senior A.E. Physiol. Rev. 68:177-231(1988). 

[ 3] Engelbrecht S., Junge W. Biochim. Biophys. Acta 1015:379-390(1990). 
[10801 377 Aspartate and omithine carbamoyltransferases signature 

Aspartate carbamoyltransferase (EC 2.1.3.2) (ATCase) catalyzes the conversion of aspartate and carbamoyl phos- 
phate to carbamoylaspartate. the second step in the de novo biosynthesis of pyrimidine nucleotides [1 . In prokaryotes 
ATCase consists of two subunits: a catalytic chain (gene pyrB) and a regulatory chain (gene pyri), while in eukaryotes 
it is a domain in a multi-functional enzyme (called URA2 in yeast, rudimentary in Drosophila, and CAD in mammals 
(21) that also catalyzes other steps of the btosynthesis of pyrimidines. . ^ ^ , u„o^ho*» 

Omithine cartjamoyltransferase (EC 2.1 .3.3) (OTCase) catalyzes the conversion of ornithine and carbamoyl phospha e 
to citrulline. In mammals this enzyme participates in the urea cycle [3] and is located in the mitochondna matnx. In 
prokaryotes and eukaryotic microorganisms it is involved in the biosynthesis of arginine. In some bactenal species it 
is also involved in the degradation of arginine [4] (the arginine deaminase pathway). 

it has been shown [5] that these two enzymes are evolutionary related. The predicted secondary structure of both 
enzymes are similar and there are some regions of sequence similarities. One of these regions inc udes three residu^ 
which have been shown, by crystallographic studies [6]. to be implicated in binding the phosphoryl group of cartjamoyi 

phosphate. 

This region was selected as a signature for these enzymes. 

Consensus pattern: F-x-lEK]-x-S-IGT]-R-T[S, R. and the 2nd T bind cart,amoyl phosphate 

-Note: ttie residue in position 3 of the pattern allows to distinguish between an ATCase (Glu) and an OTCase (Lys). 

[ 1] Lemer C.G.. Switzer R.L. J. Biol. Chem. 261:11156-11165(1986). ,ft.MOO^^ 
[2] Davidson J.N.. Chen K.C., Jamison R.S., Musmanno U.A., Kern C.B. BioEssays 15:157-164(1993). 
[ 31 Takiguchi M., Matsubasa T.. Amaya Y.. Mori M. BioEssays 10-. 163- 166(1 989). 
14] Baur H.. Stalon V.. Falmagne R. Luethi E.. Haas D. Eur. J. Biochem. 166:111-117(1987). 
5 Houghton J.E.. Bencini D.A., O'Donovan G.A.. Wild J.R. Proc. Natl. Acad. Sci. U.S.A. 81.4864-4868(1981). 
[ 6] Ke H.-M., Honzatko R.B.. Lipscomb W.N. Proc. Natl. Acad. Sci. U.S.A. 81:4037-4040(1984). 

[10811 378. Oleosins signature ,, 
Oleosins [1] are the proteinaceous components of plants' lipid storage bodies called oil bodies. Oil bodies are small 
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droplets (0.2 to 1 .5 mu-m in diameter) cx^ntaining mostly triacyiglycerol that are surrounded by a phosphoiipid/oleosin 
annulus. Oleosins may have a structural role in stabilizing the lipid body during dessication of the seed, by preventing 
coalescence of the oil. They may also provide recognition signals for specific lipase anchorage in lipolysis during 
seedling growth. Oleosins are found in the monolayer lipid/ water interface of oil bodies and probably interact with both 

5 the lipid and phospholipid moieties. 

Oleosins are proteins of 16 Kd to 24 Kd and are composed of three domains: an N-terminal hydrophilic region of 
variable length (from 30 to 60 residues): a central hydrophobic domain of about 70 residues and a C-terminal amphip- 
athic region of variable length (from 60 to 100 residues). The central hydrophobic domain is proposed to be made up 
of beta-strand structure and to interact with the lipids [2]. It is the only domain whose sequence is conserved and 

10 therefore a section from that domain was selected as a signature pattern. 

[1 082] Consensus pattern: [AGHSTl-x(2)-(AG]-x(2)-[Ll VM]-[SAD]-T-P-[LIVMF](4)-F-S-P-[LI VM](3)-P-A 

[ 1] Murphy D.J., Keen J.N., O'SuIlivan J.N., Au D.M.Y., Edwards E.-W.. Jackson P.J.. Cummins I., Gibbons T, 
Shaw C.H., Ryan AJ. Biochim. Biophys. Acta 1088:86-94(1991). 
15 1 2] Tzen J.T.C.. Lie G.C.. Huang A.H.C. J. Biol. Chem. 267:15626-15634(1992). 

[1083] 379. (Orbi VP5) Orbivirus outer capsid protein VPS 

[1084] This paper shows the location of the different capsid proteins and their relation to each other. 
[1085] [1] Schoehn G, Moss SR. Nuttall PA. Hewat EA; Virology 1997:235:191-200. 
20 [1086] 380. Orn/DAP/Arg decarboxylases family 2 signatures 

Pyridoxal-dependent decarboxylases acting on ornithine, lysine, arginine and related substrates can be classified into 
two different families on the basis of sequence similarities [1,2.3]. The second family consists of: 

Eukaryotic ornithine decarboxylase (EC 4. 1 .1 .1 7) (ODC). ODC catalyzes the transformation of ornithine into pu- 
25 trescine. 

Prokaryotic diaminopimelic acid decarboxylase (EC 4.1.1 .20) (DAPDC). DAPDC catalyzes the conversion of di- 
aminopimelic acid into lysine: the last step in the biosynthesis of lysine. 

Pseudomonas syringae pv. tabaci protein tabA. tabA is probably involved in the biosynthesis of tabtoxin and is 
highly similar to DAPDC. 

30 . Bacterial and plant biosynthetic arginine decarboxylase (EC 4.1.1.19) (ADC). ADC catalyzes the transformation 
of arginine into agmattne. the first step in the biosynthesis of putrescine from arginine. 

The above proteins, while most probably evolutionary related, do not share extensive regions of sequence similarities. 
Two of the conserved regions were selected as signature patterns. The first pattern contains a conserved lysine residue 
35 which is known, in mouse ODC [4], to be the site of attachment of the pyridoxal-phosphate group. The second pattern 
contains a stretch of three consecutive glycine residues and has been proposed to be part of a substrate-binding region 
[5]. 

These enzymes are collectively known as group IV decarboxylases [3]. 

[1087] Consensus pattern: [FY]-[PA]-x-K-[SACV]-[NHCLFW]-x(4)-[LIVMF]-[LIVMTAl-x(2)-[LIVMA]-x(3)-[GTE] [K is 
40 the pyridoxal-P attachment site] 

Consensus pattern: [GSJ-x(2.6)-[LIVMSCP]-x(2)-[LIVMF]- [DNS]-(LIVMCA]-G-G-G-[LIVMFY]-[GSTPCEQ] 

[ 1] Bairoch A. Unpublished observations (1993). 

[ 2] Martin C. Cami B., Yeh P., Stragier P., Parsot C. Patte J.-C. Mol. Biol. Evol. 5:549-559(1988). 
45 [ 3] Sandmeier E.. Hale Tl., Christen R Eur J. Biochem. 221:997-1002(1994). 

[4] Poulin R.. Lu L.. Ackermann B., Bey P., Pegg A.E. J. Biol. Chem. 267:150-158(1992). 
[ 5] Moore R.C.. Boyle S.M. J. Bacterid. 172:4631-4640(1990). 

[1 088] 381 . Osteopontin signature 
50 Osteopontin is an acidic phosphorylated glycoprotein of about 40 Kd which is abundant in the mineral matrix of bones 

and which binds tightly to hydroxyapatite [1 ,2,3]. It is suggested that osteopontin might function as a cell attachment 

factor and could play a key role in the adhesion of osteoclasts to the mineral matrix of bone. 

Osteopontin-K is a kidney protein which is highly similar to osteopontin and probably also involved in cell^dhesion. 

As a signature pattern a highly conserved region located at the N-terminal extremity of the mature protein was selected. 
55 [1089] Consensus pattern: IKQ]-x-[TA]-x(2)-[GA]-S-S-E-E-K 

[ 1] Butler W.T. Connect. Tissue Res. 23:123-36(1989). 
[ 2] Gorski J.R Calcif. Tissue Int. 50:391-396(1992). 
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[ 3] Denhardt D.T.. Guo X. FASEB J. 7:1475-1482(1993). 

[10901 382. Oxysterol-binding protein family signature . .. , „^ 

A number of eukaryotic proteins that seem to be involved with sterol synthesis and/or its regulation have been found 
[1] to be evolutionary related: 

. Mammalian oxysterol-binding protein (OSBP). A protein of about 800 amino-acid residues that binds a variety of 
oxysterols: oxygenated derivatives of cholesterol. OSBP seems to play a complex role in the regulation of sterol 

- Sst^Ioteins HES1 and KES1; highly related proteins of 434 residues that seem to play a role in ergosterol 

- vlasToSHI, a protein of 859 residues that also plays a role in ergosterol synthesis. - Yeast hypothetical protein 
YHROOIw (437 residues). 

- Yeast hypothetical protein YHR073w (996 residues). 

- Yeast hypothetical protein YKR003w (448 residues). 

[10911 All these proteins contain a moderately conserved domain of about 250 residues located in the ^-terminal 
half of OBSP. OSH1 and YHR073w and in the central section of the other proteins. As a signature pattern, the best 
conserved part was selected of this domain, a region that contains a conserved pentapeptide. 
[1 092] Consensus pattern: E-[KQ]-x-S-H-IHR]-P-P-x-ISTACF]-A 

[1093] [ 11 Jiang B., Brown J.L., Sheraton J.. Fortin N., Bussey H. Yeast 10:341-353(1994). 
[1094] 383. FMN oxidoreductase 
[1095] 384. Oxidoreductase FAD/NAD-binding domain 
Number of members: 250 
[1] 

The sequence of squash NADH:nitrate reductase and its relationship to the sequences of other flavoprotein oxidore- 
ductases. A family of flavoprotein pyridine nucleotide cytochrome reductases. 
Hyde GE, Crawford NM, Campbell W; 
J Biol Chem 1991;266:23542-23547. 

CrySrs^ilcturro^^^ fragment of com nitrate reductase at 2.5 A resolution: relationship to other 

flavoprotein reductases. 
Lu G, Campbell WH, Schneider G, Lindqvist Y; 
Structure 1994;2:809-821. . . .. 

ri 0961 385 (oxidored molyb) Eukaryotic molybdopterin oxidoreductases signature A number of different eukaryotic 
oxidoreductases that require and bind a molybdopterin cofaclor have been shown [1] to share a few regions of sequence 
similarity. These enzymes are: 

. Xanthine dehydrogenase (EC1 .1 . 1 .204). which catalyzes the oxidation of xanthine to uricacid with the concomttant 
reduction of NAD. Structurally, this enzyme of about 1300 amino acids consists of at least tf'^ee distinct domains: 
an N-terminal 2Fe-2S ferredoxin-like iron-sulfur binding domain (see <PDOC001 75>). a central FAD/NAD-binding 
domain and a C-terminal Mo-pterin domain. . 

. Aldehyde oxidase (EC 1.2.3.1). which catalyzes the oxidation aldehydes Into acids. Aldehyde oxidase is highly 
similar to xanthine dehydrogenase in its sequence and domain structure. , k ♦ 

- Nitrate reductase (EC 1 .6.6. 1 ). which catalyzes the reduction of nitrate to nitrite. Structurally, this enzyme of about 
900 amino acids consists of an N-terminal Mo-pterin domain, a central cytochrome b5-type heme-binding domain 
(see <PDOC00170>) and a C-tenninal FAD/NAD-binding cytochrome reductase domain. 

- Sulfite oxidase (EC 1.8.3.1), which catalyzes the oxidation of sulfite to sulfate. Structurally this enzyme of about 
460 amino acids consists of an N-terminal cytochrome b5-binding domain followed by a Mo-ptenn domain. 

There are a few conserved regions in the sequence of the molybdopterin-binding domain of these enzymes. The pattern 
used to detect these proteins is based on one of them. It contains a cysteine residue which could be involved in binding 

;?;9T?ot:rsuT?a?em:[GA>x(3)-tKRNQHTl-x(11^ 

11 098] [1] Wootton J.C.. Nicolson R.E.. Cock J.M.. Walters D.E.. Burke J.F.. Doyle W.A.. Bray R.C. Biochim. Biophys. 
Acta 1057:157-185(1991). 

[1099] 386. (Oxidored ql) NADH-Ubkjuinone/plastoquinone (complex I), various chains 
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This family is part of complex I which catalyses the transfer of two electrons from NADH to ubiquinone in a reaction 
that is associated with proton translocation across the membrane. Number of members: 1824 
[1] 

Medline: 93110040 

5 The NADH:ubiquinone oxidoreductase (complex I) of respiratory chains. Walker JE; 
Q Rev Biophys 1992;25:253-324. 
[1100] 387. (oxidored q3) NADH-ubiquinone/plastoquinone oxidoreductase chain 6. 179 members. 
[1101] 388. (oxidored q5) NADH-ubiquinone oxidoreductase chain 4, amino terminus 
[1102] [1J Walker JE ; Q Rev Biophys 1992;25:253-324. 

10 [1103] 389. (oxidored qS) Respiratory-chain NADH dehydrogenase 20 Kd subunit signature Respiratory-chain NADH 
dehydrogenase (EC 1 .6.5.3) [1 ,2] (also known as complex I or NADH-ubiquinone oxidoreductase) is an oligomeric 
enzymatic complex located in the inner mitochondrial membrane which also seems to exist in the chloroplast and In 
cyanobacteria (as a NADH-plastoquinone oxidoreductase). Among the 25 to 30 polypeptide subuntts of this bioener- 
getic enzyme complex there is one with a molecular weight of 20 Kd (in mammals) [3], which is a component of the 

15 Iron-sutfur (IP) fragment of the enzyme. It seems to bind a 4Fe-4S iron-sulfur cluster. The 20 Kd subunit has been 
found to be: 

Nuclear encoded, as a precursor form with a transit peptide in mammals, and In Neurospora crassa. - Mitochondrial 
encoded in Paramecium (gene psbG). 
20 . Chloroplast encoded in various higher plants (gene ndhK or psbG). 

The 20 Kd subunit is highly similar to [4]: 

Synechocystis strain PCC 6803 proteins psbG1 and psbG2. 
25 . Subunit B of Escherichia coli NADH-ubiquinone oxidoreductase (gene nuoB). 

Subunit NQ06 of Paracoccus denitrificans NADH-ubiquinone oxidoreductase. 
Subunit 7 of Escherichia coli formate hydrogenlyase (gene hycG). 
Subunit I of Escherichia coli hydrogenase-4 (gene hyfl). 

30 As as signature pattem a highly conserved region was selected, located in the central section of this subunit and which 
contains a conserved cysteine that is probably involved in the binding of the 4Fe-4S center. 

[1104] Consensus pattern: [GN]-x-D-(EAST|-[LIVMF|(2)-P-[IV]-D-ILIVMFYW](2)-x-P-x-C-P-[PT] [The C is a putative 
4Fe-4S ligand] 

35 1 1] Ragan C.I. Curr. Top. Bioenerg. 15:1-36(1987). 

[ 2] Weiss H., Friedrich T. Hofhaus G., Preis D. Eur. J. Biochem. 197:563-576(1991). 

[ 3] Arizmendi J.M.. Runswick M.J., Skehel J.M., Walker J.E. FEBS Lett. 301:237-242(1992). 

[ 4] Weidner U.. Geier S., Pfock A.. Friedrich T.. Leif H., Weiss H. J. Mol. Biol. 233:109-122(1993). 

40 [1105] 390. p53 tumor antigen signature 

The p53 tumor antigen [1 to 5. El .E2] is a protein found in increased amounts in a wide variety of transformed cells. 
It is also detectable in many proliferating nontransformed cells, but it is undetectable or present at low levels in resting 
cells. It is frequently mutated or inactivated in many types of cancer. p53 seems to act as a tumor suppressor in some, 
but probably not all, tumor types. p53 is probably involved in cell cycle regulation, and may be a trans-activator that 

45 acts to negatively regulate cellular division by controlling a set of genes required for this process. 

p53 is a phosphoprotein of about 390 amino acids which can be subdivided into four domains: a highly charged acidic 
region of about 75 to 80 residues, a hydrophobic proline-rich domain (position 80 to 150), a central region (from 150 
to about 300). and a highly basic C-terminal region. The sequence of p53 is well conserved in vertebrate species; 
attempts to Identify p53 in other eukaryotic philum has so far been unsuccessful. 

50 As a signature pattern for p53 a perfectly conserved stretch of 13 residues located in the central region of the protein 
was selected. This region, known as domain IV in [3], Is involved (along with an adjacent region) in the binding of the 
large T antigen of SV40. In man this region is the focus of a variety of point mutations in cancerous tumors. 
[1106] Consensus pattem: M-C-N-S-S-C-M-G-G-M-N-R-R 

55 [ 1] Levine A.J., Momand J., Finlay C.A. Nature 351:453-456(1991). 

[ 2] Levine A.J., Momand J. Biochim. Biophys. Acta 1032:119-136(1990). 
[ 3] Soussi T.. Caron De Fromentel C. May P. Oncogene 5:945-952(1990). 
[4] Lane D.P., Benchimol S. Genes Dev. 4:1-8(1990). 
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Anderson C.W., Mercer W.E.. Appella E. J. Biol. Chem. 267:15259-15262(1992). 



tilOT] 391 . (P5CR) Delta \-PVn-°Nne-^^^^^^^^^^ .,e terminal step in 

Delta l-pyrroline-5-carboxy ate '^^f^^f^^^^^^l^^'^^^^ Into proline. 

MSr-Consensus pattern: pALn-x(2.3HLIV]-x(3HUVMHSTACHS^^^^^^ 
[LMF]-IDENQK] 

[ 1] Delauney A.J.. Verma D.R Mol. Gen. Genet. 221:299-305(1990) 

[ 2[ Savioz A.. Jeenes D.J.. Kocher H.P.. Haas D. Gene 86:107-111(1990). 

[11091 392. Poly-adenylate binding protein, unique domain. 

11110] 393. (PAL) P»^e7'^^";- and fungi phenylpropanoid metabolism 

sr:rdTr^^^^^^^^ 
rr;s::rst?:n-rm=^^^^^ 

Srne aZoWe (EC 4.3.1.3) (histidase) catalyzes the f.,.t step In histldine degradation, the removal of an 

trprro^lrr S^^^^^^^ - sue residue is well conserved and 

?i"i?rSnL^su?pTem:'S^^^^ 

[ 1] Taylor R.G., Lambert M.A., Sexsmith E.. Sadler S.J.. Ray P.N.. Mahuran D.J.. Mclnnes R.R. J. Biol. Chem. 

^^^^il^^''r^'R^ol Ree6 J., Retey J. Biochemistry 33:6462-6467(1994). 
31 Schuster B.. Retey J. FEBS Lett. 349:252-254(1994). 
[4! Taylor R.G.. Mclnnes R.R. J. Biol. Chem. 269:27473-27477(1994). 

SuT^Sl'.'^.fs famify"does not currently match all known examples of PAS domains. 
PAS motifs appear in archaea, eubacteria and eukarya. Probably 
the most surprising identification of a PAS domain was that in 
EAG-like K+-channels(1 ,3]. 
Number of members: 308 
[1] 

Medline: 97446881 ^ . 

PAS domain S-boxes in archaea. bacteria and sensors for oxygen and redox. 

Zhulin IB, Taylor BL, Dixon R; 
Trends Biochem Sci 1997;22:331-333. 

1 .4 A^re y pSfoactive yellow protein, a cytoso.ic photoreceptor unusual fo«. active site, and chromophore. 
Borgstahl GE. Williams DR, Getzoff ED; 
Biochemistry 1995;34:6278-6287. 
[3]Medline: 98044337 
PAS. a multifunctional domain family comes to light. 
Ponting CP, Aravind L; 
Curr Biol 1997;7:674-677. v,««tM™ 
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and maturation. This protein belongs to a family that also includes: 

Drosophila antennal protein A5, a putative odorant-binding protein. 
Onchocerca volvulus antigen Ov-16 and the related proteins D1. D2 and D3. 
5 - Plasmodium falciparum putative phosphatidylethanolamine-binding protein. 

Toxocara canis secreted antigen TES-26, This lars/al protein has been shown to bind phosphatidylethanolamine. 
Yeast protein DKA1 (also known as NSP1 or TFS1 ). The function of this protein is not very clear. - Yeast hypothetical 
protein YLR179C. 

Caenorhabditis elegans hypothetical protein F40A3.3. 

10 

As a signature pattern, the best conserved region was selected which is located in the end of the first third of the 
sequence of these proteins. 

[1114] Consensus pattern: [FYL]-x-[LV]-[LIVF]-x-rnV]-[DC]-P-D-x-P-[SN]-x(10)-H 

t5 [ 1] Seddiqi N.. Bollengier R. Alliel P.M.. Perin J.P.. Bonnet P.. Bucquoy S.. Jolles P.. Schoentgen F. J. Mol. Evol. 

39:655-660(1994). 

[ 2] Schoentgen F.. Jolles P. FEBS Lett. 369:22-6(1995). 

[1115] 396. PCI domain 
20 This domain has also been called the PINT motif (Proteasome, 
lnt-6. Nlp-1 and TRIP-15) [1]. 
Number of members: 49 
11] 

Medline: 98308842 
25 The PCI domain: a common theme in three multiprotein complexes. 
Hofmann K, Bucher P; 

Trends Biochem Sci 1998;23:204-205. 
[2]Medline: 98266368 

Homoiogues of 26S proteasome subunits are regulators of transcription and translation. 

30 Aravind L, Ponting CP; 

Protein Sci 1998;7:1250-1254. 
[1116] 397. (PCMT) Protein-L-isoaspartate (D-aspartate) O-methyltransferase signature. Protein-L-isoaspartate (D- 
aspartate) O-methyltransferase (EC 2.1.1.77) (PCMT)[1] (which is also known as L-isoaspartyl protein carboxyl meth- 
yltransferase)is an enzyme that catalyzes the transfer of a methyl group from S-adenosylmethionine to the free carboxyl 

35 groups of D>aspartyl or L-isoaspartyl residues in a variety of peptides and proteins. The enzyme does not act on normal 
L-aspartyl residues L-isoaspartyl and D-aspartyl are the products of the spontaneous de amidation and/or isomerization 
of normal L-aspartyl and L-asparaginyl residues in proteins. PCMT plays a role in the repair and/ordegradation of these 
damaged proteins; the enzymatic methyl esterification of the abnormal residues can lead to their conversion to normal 
L-aspartylresidues. PCMT is a well-conserved and widely distributed cytosolic protein of about 24Kd. As a signature 

40 pattern, a conserved region in the central part of this enzyme has been developed. 
[1117] Consensus pattem: [GSA]-D-G-x(2)-G-[FYWVJ-x(3)-[AS]-P-[FY]-[DN]-x-l - 

[1118] 1 1] Kagan R.M., McFadden H.J.. McFadden P.N., O'Connor C„ Clarke S. Comp. Biochem. Physiol. 117b: 
379-385(1997). 

[1119] 398. (PCNA) Proliferating cell nuclear antigen signatures 
45 Proliferating cell nuclear antigen (PCNA) [1 ,2] is a protein involved in DMA replication by acting as a cofactor for ON A 
polymerase delta, the polymerase responsible for leading strand DNA replication. 

A similar protein exists in yeast (gene POL30) [3] and is associated with polymerase 111, the yeast analog of polymerase 
delta. In baculoviruses the ETL protein has been shown [4] to be highly related to PCNA and is probably associated 
with the viral encoded DNA polymerase. An homolog of PCNA is also found in archebacteria. 
50 As signatures for this family of proteins, two conserved regions were selected located in the N-terminal section. The 
second one has been proposed to bind DNA. 

[1 1 20] Consensus pattern: [GAl-[LIVMF]-x-[LIVMA]-x-[SAV]-[LIVM]-D-x-[NSAE]-[HKR]-[VI]-x-ILYl-[VGA]-x-[LIVM]. 
x-[LIVM]-x(4)-F 

55 - Consensus pattem: [RKA]-C-[DE]-[RH]-x(3)-[LIVMF]-x(3)-[LIVM]-x-[SGAN]-[LIVMFl-x-K-[LlVMF)(2) 
[ 1] Bravo R.. Frank R.. Blundell RA.. McDonald-Bravo H. Nature 326:515-517(1987). 

[ 2] Suzuka t., Hata S., Matsuoka M.. Kosugi S.. Hashimoto J. Eur. J. Biochem. 195:571-575(1991 ).[ 3] Bauer G. 
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A , Burgess P.M.J. Nucleic Acids Res. 18:261-265(1990). 

rV.° .1? .... A M Milior I K Nature 337:606-606(1989). 

[ 4j w r\eiiiy ty.ix.» wim*-.**.*- • -■ - 



functional PDT align well with the C-termlnal part of that P;P™f J"^ I^'^ ^^jg^^s a conserved threonine 

[1123] [ 11 Fischer R.S.. Zhao G., Jensen R.A. J. Gen. M.crobioL 137.1293-1301(1991). 
[1124] 400. PDZ domain (Also known as DHR or GLGF). 
[1 1 25] PDZ domains are found in diverse signaling proteins. 

[3] Ponting CP; Protein Sci 1997;6:464-468. 

?ne:ofU;Trt^^^^^^^^^^ 

wSne InteSSe have been shown to be structurally related [1 .2.3.4]. These enzymes are. 
. Pyruvate.orthophosphate diKinase (EC 2X9.1) (PPDK^ ^^^f ^ 

L^^ersuchasB^r^de^^^^^^^^^^ ,e reversible 

■ :Sh"orp?^^^^^^^^^ 

pyruvate and lactate are used as a carbon sour^. ^ the phosphoenolpyru- 

. Phosphoenolpyruvate-protein phosphotransferase EC 2.7^3^9^^^^^^^ 

vate^ependent sugar phosphotransferase system (PTS), a^^^^^^^ 

PTS catalyzes the phosphorylation of incoming sugar substrates concomiiani wm transferred 
:e7m':mbrn:.The%en'Lra7mechanismoftheP^^^ 

to enzyme-l (El) of PTS which in turn transfers it to a phosphoryl earner protein ^nrr,. r-r p 
the phosphoryl group to a sugar-specific permease. 

. c™.™^ patwrn: pEQSKhx*IVMn*|UVMFH^lST,-N-DHUVMH^IVMFYGTH5TAUVHUVMFl- 

[GAS]-x(2)-R 

[ 1] Reizer J., Hoischen C. Reizer A.. Pham T.N.. Saier M.H. Jr. Protein Sci. ,f 50«-5210993^ 

r4!^lU»* ».. K™-a,« F.. G»». R.H., P.s». P.. HJ. M* G.n. G««t 232:332-336(1992,. 

,11291 402, (PEPCK ATP, PJ»»*"7jW^» ?r;ftrcK,m2.^ISiTef«™«onotpr»sp.«.no,p,™«» 
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region was selected that contains four acidic residues and which is located in the central part of the enzyme. The 
beginning of the pattern is located about 10 residues to the C-terminus of an ATP-binding motif *A' (P-loop) (see 
<PDOC00017>) and is also part of the ATP-binding domain [2]. 
[1 1 30] Consensus pattern: L-l-G-D-D-E-H-x-W-x-IDE]-x-G-(l V]-x-N 

5 

Note: phosphoenolpyruvate cart>oxykinase (GTP) (EC 4.1 .1.32) an enzyme that catalyzes the same reaction, but 
using GTP Instead of ATP, is not related to the above enzyme (see <PDOC00421>). 

[ 1] Medina V.. Pontarollo R.. Glaeske D.. Tabel H., Goldie H. J. Bacteriol. 172:7151-7156(1990). 
10 1 2] Matte A., Goldie H.. Sweet R.M.. Delbaere LTJ. J. Mol. Biol. 256:126-143(1996). 

[1131] 403. (Pepcase) Phosphoenolpyruvate carboxylase active sites. Phosphoenolpyruvate carboxylase (EC 
4.1.1.31) (PEPcase) catalyzes the irreversible beta-carboxylation of phosphoenolpyruvate by bicarbonate to yield 
oxaloacetate and phosphate. The enzyme is found in all plants and in a variety of microorganisms. A histidine [1] and 
15 a lysine [2] have been implicated in the catalytic mechanism of this enzyme; the regions around these active site 
residues are highly conserved in PEPcase from various plants, bacteria and cyanobacteria and can be used as a 
signature patterns for this type of enzyme. 

[1132] Consensus pattern: [VTl-x-T-A-H-P-T-[EQ]-x(2)-R-[KRH] [H is an active site residue]- 
Consensus pattern: [IV]-M-[LIVM]-G-Y-S-D-S-x-K-D-ISTAG]-G [K is an active site residue]- 
20 [1133] [ 1] Terada K.. Izui K. Eur. J. Biochem. 202:797-803(1 991 ).[ 2] Jiao J.-A., Podesta F.E.. Chollet R.. 0*Leary 
M.H., Andreo C.S. Biochim. Biophys. Acta 1041:291-295(1990). 
[1134] 404. PET112 family signature 

The following proteins from eukaryotes, prokaryotes and archaebacteria belong to the same family: 

25 . Yeast mitochondrial protein PET112 [1], which plays an unknown role in the expression of mitochondrial genes, 

probably at the level of translation. 

Aspergillus nidulans mitochondrial protein nempA. 

Bacillus subtilis hypothetical protein yzdD. 

Moraxella catarrhalis hypothetical protein in bloR-1 3'region. 
30 . Mycoplasma genitalium hypothetical protein MG100. 

Methanococcus jannaschii hypothetical proteins MJ0019 and MJ0160. 

The size of these proteins range from 419 to 630 amino acids. As a signature pattern, a conserved region located in 
the N-terminal section was selected. 
35 [1135] Consensus pattern: [DN]-x-[DN]-R-x(3)-P-L-[LIV]-E-[LIV]-x-[ST)-x-P 

[1136] [ 1] Mulero J.J., Rosenthal J.K., Fox TD. Cum Genet. 25:299-304(1994). 
[1137] 405. (PFK) Phosphofructokinase signature 

Phosphofructokinase (EC 2.7.1.11) (PFK) [1,2] is a key regulatory enzyme in the glycolytic pathway. It catalyzes the 
phosphorylation by ATP of fructose 6-phosphate to fructose 1 ,6-bisphosphate. In bacteria PFK is a tetramer of identical 

^ 36 Kd subunits. In mammals it is a tetramer of 80 Kd subunits. Each 80 Kd subunit consist of two homologous domains 
which are highly related to the bacterial 36 Kd subunits. In Human there are three, tissue-specific, types of PFK iso- 
zymes: PFKM (muscle), PFKL (liver), and PFKP (platelet). In yeast PFK is an octamer composed of four 100 Kd alpha 
chains (gene PFK1) and four 100 Kd beta chains (gene PFK2); like the mammalian 80 Kd subunits, the yeast 100 Kd 
subunits are composed of two homologous domains. 

45 As a signature pattern for PFK a region that contains three basic residues involved in fructose-6-phosphate binding 
was selected. 

[1138] Consensus pattern: [RK]-x(4)-G-H-x-Q-[QR]-G-G-x(5)-D-R [The R/K. the H and the Q/R are involved in fruc- 
tose-6-P binding] 

50 . Note: Escherichia coli has two phosphofructokinase isozymes which are encoded by genes pfkA (major) and pfkB 
(minor). The pfkB isozyme is not evolutionary related to other prokaryotic or eukaryotic PFK's (see <PDOC00504>). 

[ 1] Poorman R.A., Randolph A., Kemp R.G., Heinrikson R.L. Nature 309:467-469(1984). 

[ 2] Heinisch J., Ritzel R.G., von Borstel R.C., Aguilera A.. Rodicio R., Zimmemiann F.K. Gene 78:309-321(1989). 

55 

[1139] 406. (PGAM) Phosphoglycerate mutase family phosphohistidine signature 

Phosphoglycerate mutase (EC 5.4.2.1 ) (PGAM) and bisphosphogly cerate mutase (EC 5.4.2.4) (BPGM) are structurally 
related enzymes which catalyze reactions involving the transfer of phospho groups between the three carbon atoms 
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of phosphoglycerate [1 ,2]. Both enzymes can catalyze three different reactions, although in different proportions: 
. The isomerization of 2-phosphoglycerate (2-PGA) to 3-phosphoglycerate (3-PGA) with 2,3-diphosphog.yceraie 
(2 3-DPG) as the primer of the reaction. 

- The synthesis of 2.3-DPG from 1 ,3-DPG with 3-PGA as a primer. 

- The degradation of 2,3-DPG to 3-PGA (phosphatase EC 3.1 .3.13 activity). 

^^e^acterial enzyme alpha-ribazole-S'-phosphate phosphatase (gene cobC) which is involved in cobalamin biosyn- 
thesis also belongs to this family [5]. 

A sianature pattern was built around the phosphohistidine residue. 

^1140] Snsensus pattern: ILIVM]-x-R-H-G-lEQ]-x(3)-N [H is the phosphohistidine residue] 

. Note: some organisms harbor a fbm, of PGAM independent of 2.3-DPG, this enzyme is not related to the family 

described above [6]. 

[ 1] Le Boulch P., Joulin v.. Garel M.-C, Rosa J.. Cohen-Solal M. Biochem. Biophys. Res. Commun. 156:874-881 

1 2rWhite M.F.. Fothergill-Gilmore L.A. FEBS Lett. 229:383-387(1988). 

I 3] Rose Z.B. Meth. Enzymol. 87:43-51(1982). ^ ^ . , , „ . or.qr^o QfiiKMQsa^ 

4 Bazan J.F., Fletterick R.J., Pilkis S.J. Proc. Natl. Acad. Sc.. U.S.A 8!^^^2-9&4^^1989^^ 

nTooif.GA Trzebiatowski J.R.. Escalante-Semerena J.C. J. Biol. Chem. 269.26503-265n(i3a4). 

I! G^ntx Delece^t El-Maghrabi M.R., Urena J.M.. Caellas 0.. Carreras J.. Puigdomenech P.. P.lkis S.J.. 
Climent R J. Biol. Chem. 267:12797-12803(1992). 

to neuroleukin, a neurotrophic factor which supports the survival of various °f ^ 

. Consensus pattem: [GSl-x-[LIVIvn-[LIVMFYW]-x(4)-[FY]-[DN]-Q-x-G-V-E-x{2)-K 

[ 1] Achari A., Marshall S.E.. Muirhewad H.. Palmieri R.H.. Noltmann EA. Philos. Trans. R. Soc. Lond.. B. Biol. 
Sci. 293:145-157(1981). 

[2] Smith M.W., Doolittle R.R J. Mol. Evol. 34:544-545(1992). ^„,,„„o, 
[ 3] Faik P.. Walker J.I.H.. Redmill A.A.M.. Morgan M.J. Nature 332:455-456(1988). 

111431 408. (PGK) Phosphoglycerate kinase signature „i„nniv.iis the 

Hvo. vrv= I 7^ J* ' , - /pnK\ ni catalvzes the second step n the second phase of glycolysis, tne 

conserved region in the N-temiinal region was selected^ „ „ ^, . „ 
Consensus pattem: [KRHGTCVN]-[VTl-[LIVMFHLIVMC]-R-x-D-x-N-(SACVpP 
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[1144] [ 1] Watson H.C.. Littlechild J.A. Biochem. Soc. Trans. 18:187-190(1990). 

[1145] 409. (PGM PMM) Phosphoglucomutase and phosphomannomutase phosphoserine signature 

Phosphoglucomutase (EC 5.4.2.2) (PGM). PGM is an enzyme responsible for the conversion of D-glucose 1 -phos- 
5 phate into D-glucose 6-phosphate. PGM participates in both the breakdown and synthesis of glucose [1]. 

Phosphomannomutase (EC 5.4.2.8) (PMM). PMM is an enzyme responsible for the conversion of D-mannose 
1 -phosphate into D-mannose 6-phosphate. PMM is required for different biosynthetic pathways in bacteria. For 
example, in enterobacteria such as Escherichiacoli there are two different genes coding for this enzyme: rfbK 
which is involved in the synthesis of the O antigen of I ipopoly saccharide and cpsG which is required for the synthesis 
10 of the M antigen capsular polysaccharide [2]. In Pseudomonas aeruginosa PMM (gene algC) is involved in the 

biosynthesis of the alginate layer [3] and in Xanthomonas campestris (gene xanA) it is involved in the biosynthesis 
of xanthan [4]. In Rhizobium strain ngr234 (gene noeK) it is involved in the biosynthesis of the nod factor. 
Phosphoacetylglucosamine mutase (EC 5.4.2.3) which converts N-acetyl-D-glucosamine 1 -phosphate into the 
6-phosphate isomer. 

15 

The catalytic mechanism of both PGM and PMM involves the formation of a phosphoserine intermediate [1]. The 

sequence around the serine residue is well conserved and can be used as a signature pattern. 

In addition to PGM and PMM there are at least three uncharacterized proteins that belong to this family [5,6]: 

20 . Urease operon protein ureC from Helicobacter pylori. 
Escherichia coli protein mrsA. 

Paramecium tetraurelia parafusin, a phosphoglycoprotein involved in exocytosis. 

A Methanococcus vannielii hypothetical protein in the 3'region of the gene for ribosomal protein S10. 

25 [1146] Consensus pattern: [GSA]-[LIVM]-x-[LIVM]-[STl-[PGA]-S-H-x-P-x(4)-[GNHE] [S is the phosphoserine resi- 
due] 

Note: PMM from fungi do not belong to this family. 

30 [ 1] Dai J.B., Liu Y, Ray W.J. Jr., Konno M. J. Biol. Chem. 267:6322-6337(1992). 

[2] Stevenson G., Lee S.J., Romana L.K., Reeves PR. Mol. Gen. Genet. 227:173-180(1991). 

[ 3] Zielinski N.A.. Chakrabarty A.M.. Berry A. J. Biol. Chem. 266:9754-9763(1991). 

[4] Koeplin R.. Arnold W., Hoette B., Simon R., Wang G.. Puehler A. J. Bacteriol. 174:191-199(1992). 

[ 5] Bairoch A. Unpublished observations (1993). 
35 [ 6] Subramanian S.V.. Wyroba E., Andersen A.P.. Satir B.H. Proc. Natl. Acad. Scl. U.S.A. 91 :9832-9836(1994). 

[1147] 410. PH domain profile 

The 'pleckstrin homology* (PH) domain is a domain of about 100 residues that occurs in a wide range of proteins 
involved in intracellular signaling or as constituents of the cytoskeleton [1 to 7]. 
40 The function of this domain is not clear, several putative functions have been suggested: - binding to the beta/gamma 
subunit of heterotrimeric G proteins, 

binding to lipids, e.g. phosphatidylinositol-4,5-bisphosphate, 
binding to phosphorylated Ser/Thr residues, 
45 . attachment to membranes by an unknown mechanism. 

It is possible that different PH domains have totally different ligand requirements. 

The 3D structure of several PH domains has been determined [8]. All known cases have a common structure consisting 
of two perpendicular anti-parallel beta sheets, followed by a C-terminal amphipathic helix. The loops connecting the 
so beta-strands differ greatly in length, making the PH domain relatively difficult to detect. There are no totally invariant 
residues within the PH domain. 

Proteins reported to contain one more PH domains belong to the following families: 

Pleckstrin, the protein where this domain was first detected, is the major substrate of protein kinase C in platelets. 
55 Pleckstrin is one of the rare proteins to contains two PH domains. 

Ser/Thr protein kinases such as the Act/Rac family, the beta-adrenergic receptor kinases, the mu isoform of PKC 
and the trypanosomal NrkA family. 

Tyrosine protein kinases belonging to the Btk/ltk/Tec subfamily. 
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L^^":!:l»?rrl!:'^^^^^^ nucleotide re.easmg factor GNRP (Ras-GRF) (which contains 2 PH 

dom7ns); guanin;nu'cte^^^ excha"nge proteins like vav. dbl. SoS and yeast CDC24, GTPase activating proteins 
like rasGAP and BEM2/IPL2, and the human break point cluster protein bcr. 

SosSSal proteins such as dynamin (see <PDOC00362>). CaenorhabdWs elegans kines.n-l.ke Prote-n unc-l 04 
See <PDOC00343>), spectrin beta-chain, syntrophin (2 PH domains) and yeast nuclear m.graton protem NUM1 _ 
. SaLjJan phipha idjlinositol-specific phosphollpase C (PI-PLC) (see <PDOC50007>) ^<^\9^^^ and 
delS Isofbrm gamma contains two PH domains, the second one is split into two parts separated by about 400 
residues. - Oxysterol binding proteins OSBP, yeast OSH1 and YHR073w. 

- Mouseproteincitron.aputativerho/raceffectorthatbindstotheGTP-bound c^^^^^^ 

- Several yeast proteins involved In cell cycle regulation and bud fomoation like BEM2. BEM3 BUD4 and the 
BEM1-binding proteins BOI2 (BEB1) and BOM (BOB1). - Caenorhabditis elegans protein MIG-10. 

. Caenorhabditis elegans hypothetical proteins C04D8.1 , K06H7.4 and ZK632.12. 

- Yeast hypothetical proteins YBR129c and YHR155w. 

The profile for the PH domain, which has been developed by Toby Gibson at the EMBL. covers the totel lengm of 
domain Several proteins contain large Insertions in the PH domain and are thus difficult to detect with this proHie. In 
some of these cases, the profile will align only to one half of the PH domain. 

. sequences known to belong to this class detected by the pattern: ALL. But it Z 
quences containing PH domains are detected, not all PH domains are. Some of the split domains He below the 
cutoff threshold. 

[ 1] Mayer B J.. Ren R., Clark K.L.. Baltimore D. Cell 73:629-630(1993). 

[ 2] Haslam R.J.. Koide H.B.. Hemmings B.A. Nature 363:309-310(1993). ^ . ,o^. -^^flMQQS^ 

3 Musacchio A.. Gibson TJ.. Rice R, Thompson J., Saraste M. Trends Biochem. S«^18-3^3^348(1993)^ 

4 Gibson TJ.. Hyvonen M.. Musacchio A.. Saraste M.. Birney E. Trends Biochem. JP49-353( 9M)_[ 5^ 
Pawson T Nature 373:573-580(1 995).t 61 Ingley E.. Hemmings B.A. J. Cell. Biochem. 56:436-443(1994).[ 7] Sar- 
aste M Hyvonen M. Curr. Opin. Struct Biol. 5:403-108(1 995).[ 8] Riddihough G. Nat. Struct. Biol. 1:755-757 
(1994). 

411. PHD-finger 
[1] 

Medline: 95216093 

The PHD fingen implications for chromatin-mediated transcriptional regulation. 
Aasland R, Gibson TJ. Stewart AF; 
Trends Biochem Scl 1995;20:56-59. 

M148r°4irrptpLC-xI Phosphatldyimositol-spedfic phospholipase C profiles Phosphatidylinositol^pecific phos- 
Dholbase C (EC 3 1 .4.11 . an eukaryotic Intracellular enzyme, plays an important role in signal transduction processes 
S Salyzes the hydrolysis of l-phosphatldyl-D-myo-inositol-3.4,5-triphosphate into the second messenger mole- 
Lul^ SSSSTr^l and?nosrtol-1 ,4'i-triphosphate. This catalytic process is tightly regulated by reversible phosphor- 

Tn— s^r^^'^r^errnti^^^^^^ 

rS^rS^^r^^^^^^^^^^ and .-box., ^e o.e- 

TheS t^o regions Is always the same (NH2-X-Y-COOH). but the spacing Is variable. In most isoforms. the distance 
be^e^these two regions is only 50-1 00 residues but in the gamma isoforms one PH domain. .t«fH2 domains am^ 
one SH3 domain are inserted between the two PLC-speclfic domains. The two conserved ^.^^^^ 
to be important for the catalytic activity. At the C-terminal of the Y-box, there is aC2 domain (see <PDOC00380>) 

Dossiblv involved in Ca-dependent membrane attachment. , ♦ m 

P^le aZysis shows tha! sequences wrth significant similarity to the X-box domain occur also Pj^^W 'c and 
STpanosome Pl-specific phospholipases C. Apart from this region, the prokaryotic enzymes show no similarly to their 
eukaryotic counterparts. 

Two profiles were developed, one covering the X-box, the other the Y-box. 

[ 1] Meldrum E.. Parker P.J.. Carozzi A. Biochim. Biophys. Acta 1092:49-71(1991).! 2] Rhee S.G., Choi K.D. Adv. 
Second Messenger Phosphoprotein Res. 26:35-61(1992). 
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[ 3) Rhee S.G.. Choi K.D. J. Biol. Chem. 267:12393-12396(1992). 

[4] Stemweis RC, Smrcka A.V. Trends Biochem. Sci. 17:502-506(1992). 

[1149] 413. (PI-PLOY) Phosphatidylinositol-specific phospholipase C profiles 
5 Phosphatidylinositol-specific phospholipase C (EC 3.1.4.11), an eukaryotic intracellular enzyme, plays an important 
role in signal transduction processes [1 ). It catalyzes the hydrolysis of 1 -phosphatidyl-D-myo-inositol-3,4,5-triphosphate 
into the second messenger molecules diacylglycerol and inositol-1 ,4,5-triphosphate. This catalytic process is tightly 
regulated by reversible phosphorylation and binding of regulatory proteins [2 to 4]. 

In mammals, there are at least 6 different isoforms of PI-PLC. they differ in their domain structure, their regulation, and 

10 their tissue distribution. Lower eukaryotes also possess multiple isoforms of PI-PLC. 

All eukaryotic PI-PLCs contain two regions of homology, sometimes referred to as 'X-box* and 'Y-box*. The order of 
these two regions is always the same (NH2-X-Y-COOH), but the spacing is variable. In most isoforms, the distance 
between these two regions is only 50-100 residues but in the gamma isoforms one PH domain, two SH2 domains, and 
one SH3 domain are inserted between the two PLC-specifrc domains. The two conserved regions have been shown 

15 to be important for the catalytic activity. At the C-terminal of the Y-box. there is a C2 domain (see <PDOC00380>) 
possibly involved in Ca-dependent membrane attachment. 

Profile analysis shows that sequences with significant similarity to the X-box domain occur also in prokaryotlc and 
trypanosome Pl-specific phospholipases C. Apart from this region, the prokaryotic enzymes show no similarity to their 
eukaryotic counterparts. 
20 Two profiles were developed, one covering the X-box, the other the Y-box. 

[ 1] Meldrum E., Parker P.J., Carozzi A. Biochim. Biophys. Acta 1 092:49-71(1 991 ).[ 2] Rhee S.G.. Choi K.D. Adv. 
Second Messenger Phosphoprotein Res. 26:35-61(1992). 
[ 3] Rhee S.G.. Choi K.D. J. Biol. Chem. 267:12393-12396(1992). 
25 [ 4] Stemweis P.C., Smrcka A.V. Trends Biochem. Sci. 17:502-506(1992). 

[1150] 414. (PK) Pyruvate kinase active site signature 

Pyruvate kinase (EC 2 7.1.40) (PK) [1] catalyzes the final step in glycolysis, the conversion of phosphoenolpyruvate 
to pyruvate with the concomitant phosphorylation of ADP to ATP. PK requires both magnesium and potassium ions for 
30 Its activity. PK is found in all living organisms. In vertebrates there are four, tissues specific, isozymes: L (liver), R (red 
cells). Ml (muscle, heart, and brain), and M2 (eariy fetal tissues). In Escherichia coli there are two isozymes: PK-I 
(gene pykF) and PK-I I (gene pykA). All PK isozymes seem to be tetramers of identical subunits of about 500 amino 
acid residues. 

As a signature pattem for PK a conserved region was selected that includes a lysine residue which seems to be the 
35 acid/base catalyst responsible for the interconversion of pyruvate and enolpyruvate, and a glutamic acid residue im- 
plicated in the binding of the magnesium ion. 

[1151] Consensus pattern: [LIVAC]-x-[LIVMJ(2)-[SAPCVl-K-[LIV]-E-[NKRST]-x-[DEQHSl-[GSTA]-[LIVM] [K is the 

active site residue] [E is a magnesium ligand] 

[1152] [ 1] Muirhead H. Biochem. Soc. Trans. 18:193-196(1990). 
40 [1153] 415. (PLDc) Phospholipase D. Active site motif 

Phosphatidylcholine-hydrolyzing phospholipase D (PLD) isoforms are activated by ADP-ribosylation factors (ARFs). 

PLD produces phosphatidic acid from phosphatidylcholine, which may be essential for the formation of certain types 

of transport vesicles or may be constitutive vesicular transport to signal transduction pathways. 

PC-hydrolyzing PLD is a homologue of cardiolipin synthase, phosphatidylserine synthase, bacterial PLDs, and viral 
45 proteins. 

Each of these appears to possess a domain duplication which is apparent by the presence of two motifs containing 
well-conserved histidine. lysine, and/or asparagine residues which may contribute to the active site, aspartic acid. An 
E. coli endonuclease (nuc) and similar proteins appear to be PLD homologues but possess only one of these motifs. 
The profile contained here represents only the putative active site regions, since an accurate multiple alignment of the 
so repeat units has not been achieved. 
Number of members: 1 39 
[11 

Medline: 96303814 

A novel family of phospholipase D homologues that includes phospholipid synthases and putative endonucleases: 
55 identification of duplicated repeats and potential active site residues. 
Ponting CP. Kerr ID; 

Protein Sci 1996;5:914-922. 

[2]Medline: 96334293 
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A duplicated catalytic motif in a new superfamily of phosphohydrolases and phospholipid synthases that includes pox- 



Koonin EV; 

Trends Biochem Sci 1996;21:242-243. 

[3]Medline: 94327597 ^. . . , 

Cloning and expression of phosphatidylcholine-hydrolyzing phospholipase D from Ricinus communis L. 

Wang X, Xu L. Zheng L; 
J Biol Chem 1994;269:20312-20317. 
[4]Medline: 97386825 

Regulation of eukaryotic phosphatidylinositol-specific phospholipase C and phospholipase u. 
Singer WD, Brown HA, Sternweis PC; 
Annu Rev Biochem 1997;66:475-509. 
ri1541 416, (PMI type1) Phosphomannose isomerase type I signatures 

Phosphomannose isomerase (EC 5.3.1.8) (PMI) [1,2] is the enzyme that catalyzes the interconversion of mannose- 
6-phosphate and fructose-6-phosphate. In eukaryotes. it is involved in the synthesis of GDP-mannose which is a con- 
stituent of N- and O-linked giycans as well as GPI anchors. In prokaryotes. it is involved in a vanety of pathways 
includinq capsular polysaccharide biosynthesis and D-mannose metabolism. 

Three classes of PMI have been defined on the basis of sequence similarities [1]. The first class comprises all known 
eukaryotic PMI as well as the enzyme encoded by the manA gene in enterobacteria such as Eschenchia coli. Class I 
PMI's are proteins of about 42 to 50 Kd which bind a zinc ion essential for their activity. 

As signature patterns for class I PMI. two conserved regions were selected. The first one is located the N-termmal 
section of these proteins, the second in the C-terminal half. Both patterns contain a residue involved [3] in the binding 
of the zinc ion. 

[1 1 551 Consensus pattern: Y-x-D-x-N-H-K-P-E [E Is a zinc ligand] 

- Consensus pattern: H-A-Y-[LIVM]-x-G-x(2)-[LIVMl.E-x-M-A-x-S-D-N-x-[LIVMl-R-A-G-x-T-P-K [H is a zinc ligandj 

[ 1] Proudfoot A.E.I.. Turcatti G.. Wells T.N.C.. Payton M.A.. Smith D.J. Eur. J. Biochem. 219:415-423(1994)^ 
[ 2] Coulin F.. Magnenat E.. Proudfoot A.E.I.. Payton M.A.. Scully R, Wells TN.C. Biochemistry 32:14139-14144 
(1 993) 

1 3] Cleasby A., Wonacott A., Skarzynski T.. Hubbard R.E.. Davies G.J., Proudfoot A.E.I., Bernard A.R.. Payton 
M.A., Wells T.N.C. Nat. Struct. Biol. 3:470-479(1996). 

[11561 417- (PNP UDP 1) Purine and other phosphorylases family 1 signature 
The following phosphorylases belongs to the same family: 

. Purine nucleoside phosphorylase (EC 2.4.2.1) (PNP) from most bacteria (gene deoD). This enzyme catalyzes the 
cleavage of guanosine or inosine to respective bases and sugar-1 -phosphate molecules [1]. 

- Uridine phosphorylase (EC 2.4.2.3) (UdRPase) from bacteria (gene udp) and mammals. Catalyzes the cleavage 
of uridine into uracil and ribose-1 -phosphate. The products of the reaction are used either as carbon and energy 
sources or in the rescue of pyrimidine bases for nucleotide synthesis [2]. 

- 5'-methylthioadenosine phosphorylase (EC 2.4.2.28) (MTA phosphorylase) from Sulfolobus solfatancus p]. 

As a signature pattern, a conserved region vi^as selected in the central part of these enzymes. 
11157] Consensus pattern: [GSTl-x-G-[LIVMl-G-x-[PA]-S-x-[GSTA]-l-x(3)-E-L 

. Note: it shoudi be noted that mammalian and some bacterial PNP as well as eukaryotic MTA phosphorylase belong 
to a different family of phosphorylases (see <PDOC00954>). 

[11 Takehara M Ling F., Izawa S., Inoue Y., Kimura A. Biosci. Blotechnol. Biochem. 59:1987-1990(1995). 
21 Watanabe S.-l.. Hino A., Wada K., Ellason J.F.. Uchida T J. Biol. Chem. 270:12191-12196(1995) 
[ 3] Cacciapuoti G.. Porcelli M.. Bertoldo C. De Rosa M.. Zappia V. J. Biol. Chem. 269:24762-24769(1994). 

[11581 418. (PP2C) Protein phosphatase 2C signature . • k 

Protein phosphatase 2C (PP2C) is one of the four major classes of mammalian serinefthreonine specific protein phos- 
phatases (EC 3.1 .3.16). PP2C [1] is a monomeric enzyme of about 42 Kd which shows broad substrate specificrty and 
is dependent on divalent cations (mainly manganese and magnesium) for its activity. Its exact physiological role is stil 
unclear. Three isozymes are currently known in mammals: PP2C-alpha. -beta and -gamma. In yeast, ttiere are at least 
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four PP2C homologs: phosphatase PTC1 [2] which has weak tyrosine phosphatase activity in addition to its activity 
on serines, phosphatases PTC2 and PTC3, and hypothetical protein YBR125c. Isozymes of PP2C are also known 
from Arabidopsis thaliana (ABI1. PPH1). Caenorhabditis elegans (FEM-2. F42G9.1, T23F11.1), Leishmania chagasi 
and Paramecium tetraurella. 

5 tn Arabidopsis thaliana, the kinase associated protein phosphatase (KAPP) [3] is an enzyme that dephosphorylates 
the Ser/Thr receptor-like kinase RLK5 and which contains a C-terminal PP2C domain. 

PP2C does not seem to be evolutionary related to the main family of serine/ threonine phosphatases: PP1 . PP2A and 
PP2B . However, it is significantly similar to the catalytic subunit of pyruvate dehydrogenase phosphatase (EC 3. 1 .3.43) 
(PDPC) [4], which catalyzes dephosphorylation and concomitant reactivation of the alpha subunit of the E1 component 
10 of the pyruvate dehydrogenase complex. PDPC is a mitochondrial enzyme and, like PP2C, is magnesium-dependent. 
As a signature pattem, the t>est conserved region was selected which is located in the N-terminal part and contains a 
perfectly conserved tripeptide. This region includes a conserved aspartate residue involved in divalent cation binding 
[5). 

[1 1 59] Consensus pattem: [LI VMFY]-[LI VMFYA]-[GSAC]-[LI VM]-tFYC]-D-G-H-[GAV] 

15 

Note: PP2C belongs [6] to a superfamily which also includes bacterial proteins such as Bacillus spoil E, rsbU and 
rsbW, Synechocystis PCC 6803 icfG as well as a domain in fungal adenylate cyclases. 

[ 1] Wenk J.. Trompeter H.-l.. Pettrich K.-G.. Cohen P.T.W.. Campbell D.G., Mieskes G. FEBS Lett. 297:135-138 
20 (1992). 

1 2] Maeda T, Tsai A.Y.M.. Saito H. Mol. Cell. Biol. 13:5408-5417(1993). 

( 3] Stone J.M., Collinge M.A.. Smith R.D., Horn M.A.. Walker J.C. Science 266:793-795(1994). 
[ 41 Lawson J.E., Niu X.-D., Browning K.S., Trong H.L., Yan J., Reed L.J. Biochemistry 32:8987-8993(1993). 
[ 5] Das A.K., Helps N.R.. Cohen PT.W.. Barford D. EMBO J. 24:6798-6809(1996). 
25 1 6] Bork P.. Brown N.P., Hegyi H.. Schultz J. Protein Sci, 5:1421-1425(1996). 

[1160] 419. (PPTA) Protein pre ny transferases alpha subunit repeat signature 

Protein prenyltransferases catalyze the transfer of an isoprenyl moiety to a cysteine four residues from the C-terminus 
of several proteins. They are heterodimeric enzymes consisting of alpha and beta subunits. The alpha subunit is thought 
30 to participate in a stable complex with the isoprenyl substrate; the beta subunit binds the peptide substrate. Distinct 
protein prenyltransferases might share a common alpha subunit. Both the alpha and beta subunit show repetitive 
sequence motifs [1]. These repeats have distinct structural and functional implications and are unrelated to each other. 
Known protein prenyltransf erase alpha subunits are: 

35 . Mammalian protein farnesyltransferase alpha subunit. 

Yeast protein RAM2, a protein famesyltransferase alpha subunit. 
Yeast protein BET4, a protein geranylgeranyltransf erase alpha subunit. 

The conserved domain of the alpha subunit consists of about 34 amino acids and is repeated five times. It contains 
40 an invariant tryptophan possibly involved in heterodimerization with the conserved phenylalanines in the repeated 
domains of the beta subunits, via hydrophobic bonds. The signature pattem for this domain is centered on the invariant 
tryptophan. 

[1161] Consensus pattem: [PSIAV]-x-[NDFV]-[NEQIY]-x-[LIVMAGP]-W-[NQSTHF]-[FYHQ]-[LIVMR] 
[1162] [ 1) Boguski M.S., Mun-ay A.W.. Powers S. New Biol. 4:408-411(1992). 

45 [1163] 420. (PR55) Protein phosphatase 2A regulatory subunit PR55 signatures 

Protein phosphatase 2A (PP2A) is a serine/threonine phosphatase involved in many aspects of cellular function in- 
cluding the regulation of metabolic enzymes and proteins involved in signal transduction. PP2A is a trimeric enzyme 
that consists of a core composed of a catalytic subunit associated with a 65 Kd regulatory subunit (PR65). also called 
subunit A; this complex then associates with a third variable subunit (subunit B). which confers distinct properties to 

50 the holoenzyme [1]. One of the forms of the variable subunit is a 55 Kd protein (PR55) which is highly conserved in 
mammals - where three isoforms are known to exist -. Drosophila and yeast (gene CDC55). This subunit could perform 
a substrate recognition function or be responsible for targeting the enzyme complex to the appropriate subcellular 
compartment. 

As signature patterns, two perfectly conserved sequences of 15 residues were selected; one located in the N-terminal 
55 region, the other in the center of the protein. 

[1 1 64] Consensus pattem: E-F-D-Y-L-K-S-L-E-l-E-E-K-l-N 

Consensus pattem: N-IAG]-H-[TA]-Y-H-I-N-S-I-S-[LIVM]-N-S-D 

[1165] [ 1] Mayer-Jaekel R.. Hemmings B.A. Trends Cell Biol. 4:287-291(1994). 
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[1166] 421 . N-{5'phosphoribosyl)anthranilate (PRA) isomerase 

tA^ ximt^^^^t^ hM Drioetla IP Miarmann T .lan<;oniUS JN: 

[ l] »* «■ III** »••, • • .w— — t . •■- 

J Mol Biol 1992;223:477-507. 

[11671 422. (PRK) Phosphoribulokinase signature ^ . ^ 

5 Phosphoribulokinase(EC2.7.1.19)(PRK)[1,2]isoneoftheenzymesspecifictotheC^^^^^ 

phate cycle which is the major route by which carbon dioxide is assimilated and reduced by^"to'[;°P'^';^ 
PRK catalyzes the ATP-dependent phosphorylation of ribulose 5-phosphate into nbulose 1 .5-bisphosphate which is 
the substrate for RubisCO. PRK's of diverse origins show different properties with respect to jhe size of the protein, 
the subunit structure, or the enzymatic regulation. However an alignment of the sequences of PRK ^o;" P'^"*«- 
10 photosynthetic and chemoautotrophic bacteria shows that there are a few regions of sequence sim.lanty. As a signature 
pattern one of these regions was selected. 
[1 1 68] Consensus pattern: K-[LI VM]-x-R-D-x{3)-R-G-x-[ST]-x-E 

[ 1] Kossmann J., Klintworth R., Bowien B. Gene 85:247-252(1989). 
15 1 2] Gibson J.L.. Chen J.-H.. Tower P.A.. Tabita F.R. Biochemistry 29:8085-8093(1990). 

ri169] 423. (PRPP synt) Phosphoribosyl pyrophosphate synthetase signature 

Phosphoribosyl pyrophosphate synthetase (EC 2.7.6.1) (PRPP synthetase) catalyzes the formation of PRPP from ATP 
and ribose 5-phosphate. PRPP is then used in various biosynthetic pathways, as for example in the formation of punnes, 
20 pyrimidines. histidine and tryptophan. PRPP synthetase requires inorganic phosphate and magnesium ions for its 
stability and activity. 

In mammals, three isozymes of PRPP synthetase are found; in yeast there are a least four isozymes^ 
As a signature pattern for this enzyme, a very conserved region was selected that has been suggested to be involved 
in binding divalent cations [1]. This region contains two conserved aspartic acid residues as well as a histidine. which 
25 are all potential ligands for a cation such as magnesium. 

11170] Consensus pattern: D-[LI]-H-[SAl-x-Q-[IMSTl-[QM]-G-[FY]-F-x(2)-P-[LIVMFC]-D . ....g. 

[1171] [ 1] Bower S.G.. Harlow K.W., Switzer R.L.. Hoven-Jensen B. J. Biol. Chem. 264.10287-10291(1989). 
[1172] 424. (PRTP) Herpesvirus processing and transport protein 

The members of this family are associate with capsid intemnediates during packaging of the virus. 
30 Number of members: 31 
[1] 

Medline: 98362148 

Herpes simplex virus type 1 cleavage and packaging proteins 
UL15 and UL28 are associated with B but not C capsids during 
35 packaging. Yu D, Weller SK; 

J Virol 1998;72:7428-7439. 

[1173] 425. Photosystem I psaG / psaK (RSI PSAK) proteins signature ^- . , ♦ „„ ♦r,„efor 

Photosystem I (PSI) [1] is an integral membrane protein complex that uses light energy to mediate electron fransfe 
from plastocyanin to ferredoxin. It is found in the chloroplasts of plants and cyanobactena. PSI is «^P°^«J°^ !* '^^ 
40 14 different subunits, two of which PSI-G (gene psaG) and PSI-K (gene psaK) are small hydrophobic proteins of about 
7 to 9 Kd and evolutionary related [2]. Both seem to contain two transmembrane regions. Cyanobactena seem to 

[1174]^ "iS'I'^irnature pattem, the best-conserved region was selected which seems to correspond to the second 
transmembrane region. 

. Consensus pattem: [GT]-F-x-[LIVMl-x-[DEA]-x(2)-lGAl-x-[GTA]-[SA]-x-G-H-x-(LIVM]-[GA] 

[11 Golbeck J.H. Biochim. Biophys. Acta 895:167-204(1987). ,oq<b/.qq,, 
[2] Kjaemlff S., Andersen B., Nielsen V.S.. Moller B.L.. Okkels J.S. J. Biol. Chem. 268:18912-18916(1993). 

ri175] 426. PTR2 family proton/oligopeptide symporters signatures 

X family of eukaryotic and prokaryotic proteins that seem to be mainly involved t^-J"^'^^^^;^^ Tunaal 
concomitant uptake of a proton has been recently characterized [1 .2]. Proteins that belong to this family are. - Fungal 
peptide transporter PTR2. 

- Mammalian intestine proton-dependent oligopeptide transporter PeptTI . 

- Mammalian kidney proton-dependent oligopeptide transporter PeptT2. 
Drosophila optl. 
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Arabidopsis thaliana peptide transporters PTR2-A and PTR2-B (also known as the histidine transporting protein 
NTR1). 

Arabidopsis thaliana proton-dependent nitrate/chlorate transporter CHL1 . 

Lactococcus proton-dependent di- and tri-peptide transporter dtpT. 
5 - Caenorhabditis elegans hypothetical protein C06G8.2. 

Caenorhabditis elegans hypothetical protein F56F4.5. 

Caenorhabditis elegans hypothetical protein K04E7.2. 

Escherichia coli hypothetical protein ybgH, 

Escherichia coli hypothetical protein ydgR. 
10 - Escherichia coli hypothetical protein yhiP. 

Escherichia coli hypothetical protein yjdL. 

Bacillus subtilis hypothetical protein yclF. 

These integral membrane proteins are predicted to comprise twelve transmembrane regions. As signature patterns, 
15 two of the best conserved regions were selected. The first is a region that includes the end of the second transmembrane 
region, a cytoplasmic loop as well as the third transmembrane region. The second pattern corresponds to the core of 
the fifth transmembrane region. 

- Consensus pattern: [GA]-[GAS]-[LIVMFYWA]-ILIVM]-[GAS]-D-x-[LIVMFYMTl-[LIVMFYW]-G-x(3)-[^^ 
20 [GSTAV]-x-[LIVMF]-x(3)-[GA] 

- Consensus pattern: [FYT|-x(2)-[LMFY]-[FYV]-tLIVMFYWA]-x-[IVG]-N-[LIVMAG]-G-[GSA]-[LIMFl 

[ 1] Paulsen IT.. Skurray R.A. Trends Biochem. Sci. 19:404-404(1994). 
[2] Steiner H.-Y.. Naider Becker J.M. Mol. Microbiol. 16:825-834(1995). 

25 

[1176] 427. Pumilio-family RNA binding domains (aka PUM-HD, Pumilio homology domain) 
Puf domains are necessary and sufficient for sequence specific 

RNA binding in fly Pumilio and worm FBF-1 and FBF-2. Both proteins function as transtational repressors in early 
embryonic development by binding sequences in the 3' UTR of target mRNAs (e.g. the nanos response element (NRE) 
30 in fly Hunchback mRNA, or the point mutation element (PME) in worm fem-3 mRNA). Other proteins that contain Puf 
domains are also plausible RNA binding proteins. JSNI.YEAST for instance, appears to also contain a single RRM 
domain by HMM analysis. 

Puf domains usually occur as a tandem repeat of 8 domains. 

The Pfam model does not necessarily recognize all 8 domains in all sequences; some sequences appear to have 5 
35 or 6 domains on initial analysis, but further analysis suggests the presence of additional divergent domains. 
[1177) [1] Zhang B, Gallegos M, Puoti A, Durkin E, Fields S, Kimble J, 

Wickens MR Nature 1997:390:477-484. [2] Zamore PD, Williamson JR, Lehmann R. RNA 1997;3:1421-1433. 
[1178] 428. PWWP domain. The PWWP domain is named after a conserved Pro-Trp-Trp-Pro motif. The function of 
the domain is currently unknown. Number of members: 19 
40 [1179] [1] Medline: 98282232. WHSC1 , a 90 kb SET domain-containing gene, expressed in early development and 
homologous to a Drosophila dysmorphy gene maps in the Wotf-Hirschhom syndrome critical region and is fused to 
IgH in t(4;14) multiple myeloma. Stec I. Wright TJ. van Ommen GJB, de Boer PAJ, van Haeringen A, Moorman AFM, 
Althen- MR, den Dunnen JT; Hum Mol Genet 1998;7:1071-1082. 
[1180] 429. PX domain 

45 Eukaryotic domain of unknown function present in phox proteins. PLD isoforms. a PI3K isoform. 
Number of members: 71 
[1] 

Medline: 97084820 

Novel domains in NADPH oxidase subunits. sorting nexins. and 
50 Rdlns 3-kinases: binding partners of SH3 domains? 
Ponting CP; 
Protein Sci 1996;5:2353-2357. 
[1181] 430. ParA family ATPase 
[1] 

55 Medline: 91141297 

A family of ATPases involved in active partitioning of diverse bacterial plasmids. 
Motallebi-Veshareh M, Rouch DA. Thomas CM; 
Mol Microbiol 1990;4:1455-1463. 
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Number of members: 122 

r^Mo**-! Ao-i /pQp/Q cc2t^ Parv'OYiru? c*^* protein. 72 members. 

liiSestii'ji^fSS cata^es the hydrolysis of pectin into pectate and .ethane,. In 

7al fp Ss an important Se in cell wall metabolism during fruit ripening, .n plan, ''^^'-^f ^^^^^^^^^^^^^ 
Erwinia carotovora and in fungal pathogens such as Aspergillusniger. pecUnesterase .s involved .n maceration and 

"iCtrlnTeXotic pectinesterases share a few regions of sequence similarity [1.2.3]. two of these regions 

TheTnTt'f Ja^dr^^^^ section of these enzymes; it contains a a>nsen,ad tyrosine w^ch rjay 

playaToleinLcatalytiTmechanism 13]. The second patten, a^^^^ 
located in the central part of these enzymes. 

. Consensus pattem: [GSTNP]-x(6)-[FWHR]-[IVN]-[KEP]-x-G.[STIVKRQl-Y4DNQKRMV]^^^^^^ 
- Consensus pattem: [IV].x-G-[STADHLIVr|-D-IFYIHIV]-[FSN]-G 

[ 1] Ray J.. Knapp J.. Grierson D.. Bird C, Schuch W. Eur. J. Biochem. 174:119-124(1988). 

[ 2] Plastow G.S. Mol. Microbiol. 2:247-254(1988). 

[ 3] Markovic O., Joemvall H. Protein Sci. 1:1288-1292(1992). 

[1184] 433. Pentapeptide repeats (8 copies) 

These repeats are found in many cyanobacterial proteins. 

The repeats were first identified in hglK [1]. The function of these repeats is unknown. 

The structure of this repeat has been predicted to be a beta-helix [2]. „^,„KorQ- 7^^ 

The repeat can be approximately described as A(D/N)LXX. where X can be any amino acd.Number of members. 75 

[1] 

The Sk gene is required for localization of heterocyst-specific glycolipids in the cyanobacterium 
Anabaena sp. strain PCC 7120. 
Black K, Buikema WJ, Haselkorn R; 

J Bacteriol 1995;177:8440-6448. 

I21Medline: 98318059 
Structure and distribution of pentapeptide repeats in bacteria. 
Bateman A, Murzin A. Teichmann SA; 

Protein Sci 1998;7:1477-1480. 

Chalall^^^^^^ cDNA encoding a thylakoid lumen protein related to a novel 'pentapeptide repeaf 

family of proteins. 
Kieselbach T, Mant A, Robinson C. Schroder WP; 

FEBS Lett 1998;428:241-244. 
[1185] 434. Polypeptide deformylase 

[1] 

TnersubJa?^^^^^ zinc metalloproteases superfamily revealed by the solution structure of peptide deformylase. 
Meinnel T, Blanquet S, Dardel F; 

J Mol Biol 1996;262:375-386. 

[2]Medline: 98332750 
Solution structure of nickel-peptide defonmylase. 
Dardel F, Ragusa S, Lazennec C, Blanquet S. Meinnel T; 

J Mol Biol 1998;280:501-513. 
Number of members: 21 

[1186] 435. Peptidyl-tRNA hydrolase signatures .^^...^or^wi 
Peptidyl-tRNA hydrolase (EC 3.1.1.29) (PTH) is a bacterial enzyme that cleaves pept.dyl-tRNA or N-acyl-am,noacy I- 
tRNA to S free peptides or N-acyl-amino acids and tRNA. The natural substrate for this enzyme may be pept-dy^ 
tRN A wh^h drop off the ribosome during protein synthesis [1 .2]. Bacterial PTH has been found [2.3] to be evolutionary 
related to yeast hypothetical protein YHR189w. ^^.^^^ 
PTH and YHR189w are proteins of about 200 amino acid residues. As signature pattert«. two conserved ^gions were 
seteid that ^ch contain an histidine. The first of these regions is located in the N-tem,inal sectK,n. the other .n the 
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central part. 

- Consensus pattern: [FY]-x(2)-T-R-H-N-x-G-x{2HLIVMFA](2)-[DE] 

- Consensus pattern: [GS]-x(3)-H-N-G-ILIVM]-[KR]-[DNSl-[LIVMT| 

5 

[ 1] Garcia-Villegas M.R., De La Vega P.M.. Galindo J.M.. Segura M., Buckingham R.H., Guarneros G. EMBO J. 

10:3549-3555(1991). 

[2] De La Vega P.M.. Galindo J.M.. Old I.G.. Guarneros G. Gene 169:97-100(1996). 
[ 3] Ouzounis C. Bork P., Casari G., Sander C. Protein Sci. 4:2424-2428(1995). 

10 

[1187] 436. (Peptidase Ml 7) Cytosol aminopeptidase signature 

Cytosol aminopeptidase is a eukaryotic cytosolic zinc-dependent exopeptidase that catalyzes the removal of unsut>- 
stituted amino-acid residues from the N-terminus of proteins. This enzyme is often known as leucine aminopeptidase 
(EC 3.4.11.1) (LAP) but has been shown [1] to be identical with prolyl aminopeptidase (EC 3.4.11.5). Cytosol emi- 
ts nopeptidase is a hexamer of identical chains, each of which binds two zinc ions. 

Cytosol aminopeptidase is highly similar to Escherichia coH pepA, a manganese dependent aminopeptidase. Residues 
involved in zinc ion-binding [2] in the mammalian enzyme are absolutely conserved in pepA where they presumably 
bind manganese. 

A cytosol aminopeptidase from Rickettsia prowazekii [3] and one from Arabidopsis thaliana also belong to this family 
20 As a signature pattern for these enzymes, a perfectly conserved octapeptide was selected which contains two residues 
involved in binding metal ions: an aspartate and a glutamate. 

Consensus pattern: N-T-D-A-E-G-R-L [The D and the E are zinc/manganese ligands] 
Note: these proteins belong to family M17 in the classification of peptidases [4,E1]. 

25 

[ 1] Matsushima M., Takahashi T, Ichtnose M., Miki K., Kurokawa K., Takahashi K. Biochem. Biophys. Res. Conn- 
mun. 178:1459-1464(1991). 

[ 2] Burley S.K.. David P.R.. Sweet R.M.. Taylor A., Lipscomb W.N. J. Mol. Biol. 224:113-140(1992). 
[ 3] Wood D.O.. Solomon M.J., Speed R.R. J. Bacteriol. 175:159-165(1993). 
30 [ 4] Rawlings N.D., Barrett A.J. Meth. Enzymol. 248:183-228(1995). 

[1188] 437. Assemblin (Peptidase family S21) 
[1] 

Medline: 96399137 
35 Three-dimensional structure of human cytomegalovirus protease. 

Shieh HS, Kurumbail RG Stevens AM, Stegeman RA. Sturman EJ, 
Pak JY, Wittwer AJ, Palmier MO, Wiegand RC, Holwerda BC. 
Stallings WC; 
Nature 1996;383:279-282. 
40 Number of members: 29 

[1189] 438. Pollen proteins Ote e 1 family signature 

The following plant pollen proteins, whose biological function is not yet known, are structurally related [1]: 

Olive tree pollen major allergen (Ole e 1). 
45 - Tomato anther-specific protein LAT52. - Maize pollen-specific protein ZmC13. These proteins are most probably 
secreted and consist of about 145 residues. As shown in the following schematic representation, there are six 
cysteines which are conserved in the sequence of these proteins. They seem to be involved in disulfide bonds. 



50 

xxxxxxCxCxxxxxxxxxCxxxxxxxxxxxxxxxxxCxxxxxCxxxxxxxxxxxxxxxxxxxxCxxxxxxx 

******'C': conserved cysteine involved in a disulfide bond, 
•*•: position of the pattern. 

Consensus pattem: [EQ]-G-x-V-Y-C-D-T-C-R [The two C's are probably involved in disulfide bonds] 
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[11901 [ 1] Villalba M.. Batanero E., Lopez-Otin C. Sanchez L.M.. Monsalve R.I.. Gonzalez De La Pena M.A., Lahoz 
Q Rodriguez R. Eur. J. Biochftm. 216:863-869(1993). 
[1191] 439. Pollen allergen 

This family contains allergens lol PI, Pll and Pill from LoHum perenne. 
Number of members: 49 
[1] 

SSte Za^sfr^^^^^ of a Lollum perenne (perennial rye grass) pollen allergen. Lol p III: comparison with known 
Lol p 1 and II sequences. 
Ansari AA, Shenbagamurthi P, Marsh DG; 
Biochemistry 1989;28:8665-8670. 
[11921 440. Porphobilinogen deaminase cofactor-binding site , ^ k ov.r>fhoci<= 

Porphobilinogen deaminase (EC 4.3.1 .8). or hydroxymethylbilane synthase, is an enzyme involved ,n b.osyn^^^^^^^^ 
ofTorphyrins and related macrocycles. It catalyzes the assembly of four porphobilinogen (PBG) units m a head to tail 
fashion to fomi hydroxymethylbilane. ^ • ♦ t^M^n 

The enzyme covalently binds a dipyrromethane cofactor to which the PBG subunits are added m a stepwise fashion. 

coif enzyme (gene hemC). this cofactor has been shown [1] to be bound by the suj- ^^^^^^ 
cysteine. The region around this cysteine is conserved in porphobilinogen deaminases from various prokaryotic and 
eukaryotic sources. 

- Consensus pattern: E-R-x-[LIVMFA]-x(3)-[LIVMF]-x-G-[GSA]-C-x-[IVTl-P-[LIVMF] 

-[GSA] [CIS the cofactor attachment site] „ „. ^ . oi.>i.Q-ic_Qi«/iQRR^ 

[1193] [ 1] Miller A.D.. Hart G.J.. Packman L.C., Battersby A.R. Biochem. J. 254.915-918(1988). 

KlfalnslJ preseniS are a major cause of early onset Alzheime^s disease [2]. It has been found that presenilin-l 
(Swiss:P49768) binds to beta-catenin in vivo [4]. This family also contains SPE proteins from C.elegans. 
Number of members: 23 
[1] 

Medline: 98045995 

Presenilins and Alzheimer's disease. 

Kim TW. Tanzi RE; 

Gun- Opin Neurobiol 1997;7:683-688. 

[2]Mediine: 98045995 
Presenilins and Alzheimer's disease. 
Kim TW, Tanzi RE; 

Cun- Opin Neurobiol 1997;7:683-688. 

[31Medline: 98099802 
Interaction of presenilins with the filamin family of actin-binding proteins. 
Zhang W, Han SW, McKeel DW, Goate A. Wu JY; 

J Neurosci 1998;18:914-922. 

[4]Medline: 99004850 . 
Destabilisation of beta-catenin by mutations in presenilin-1 potentiates neuronal apoptosis. 

Zhang Z, Hartmann H. Do VM, Abramowski D, Sturchler-Pierrat 
C. Staufenbiel M. Sommer B, van de Wetering M. Clevers H. 
Saftig P, De Strooper B, He X, Yankner BA; 

Nature 1998;395:698-702. 

[11951 442. (Pribosyltran) Purine/pyrimidine phosphoribosyl transferases signature ^ , ^^^^ ^u^c 

PhosphoribosUansferases (PRT) are enzymes that catalyze the synthesis 

phoribosylpyrophosphate (PRPP) and an enzyme specific amine. A number of PRTs are '"^^'^^^^^^^^^^ 

of purine pyrimidine. and pyridine nucleotides, or in the salvage of purines and pynmidines. These enzymes are. 

- Adenine phosphoribosyltransferase (EC 2.4.2.7) (APRT), which is Involved in purine salvage. 

- Hypoxanthineiuanine or hypoxanthine phosphoribosyltransferase (EC 2.4.2.8) (HGPRT or HPRT), which are 

involved in purine salvage. . 

. Orotate phosphoribosyltransferase (EC 2.4.2.10) (OPRT). which is involved in pynmidine biosynthesis. 

. Amido phosphoribosyltransferase (EC 2.4.2.14). which is involved in purine biCKynthesis. 

. XanthinUuanine phosphoribosyltransferase (EC 2.4.2.22) (XGPRT). which is involved .n punne salvage. 
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In the sequence of all these enzymes there is a small conserved region which nnay be involved in the enzymatic activity 
and/or be part of the PRPP binding site [1]. 

- Consensus pattern: [LIVMFYWCTA]-[LIVM]-[LIVMAHUVMFC]-[DE]-D-[LIVMS]-[LIVM]-[STAVD1-[STAR1-[GAC]- 
5 x-[STAR] 

Note: in position 11 of the pattern most of these enzymes have Gly. 

[1196] [1] Hershey H.V.. Taylor M.W. Gene 43:287-293(1986). 
[1197] 443. (Pro CA) 

10 Prokaryotic-type carbonic anhydrases signatures 

Carbonic anhydrases (EC 4.2.1.1) (CA) are zinc metalloenzymes which catalyze the reversible hydration of carbon 
dioxide. In Escherichia coli. CA (gene cynT) is involved in recycling carbon dioxide formed in the bicarbonate-dependent 
decomposition of cyanate by cyanase (gene cynS). By this action, it prevents the depletion of cellular bicarbonate [1]. 
In photosynthetic bacteria and plant chloroplast. CA is essential to inorganic carbon fixation [2]. Prokaryotic and plant 

15 chloroplast CA are structurally and evolutionary related and form a family distinct from the one which groups the many 
different forms of eukaryotic CA's (see <PDOC00146>). Hypothetical proteins yadF from Escherichia coll and HI1301 
from Haemophilus influenzae also belong to this family. Two signature patterns were developed for this family of en- 
zymes. Both patterns contain conserved residues that could be involved in binding zinc (cysteine and histidine). 

20 - Consensus pattern: C-[SA]-D-S>R-[LIVM]-x-[AP] 

- Consensus pattem: [EQ]-Y-A-[LIVM]-x(2)-[LIVM]-x(4)-[LIVMF](3)-x-G-H-x(2)-C-G 

[ 1] Guilloton M.B., Korte J. J., Lamblin A.F, Fuchs J.A., Anderson P.M. J. Biol. Chem. 267:3731-3734(1992). 
[ 2] Fukuzawa H.. Suzuki E., Komukai Y. Miyachi S. Proc. Natl. Acad. Sci. U.S.A. 89:4437-4441(1992). 

25 

[1 1 98] 444. (ProlyLoligopep) 

Prolyl oligopeptidase family serine active site 

[1 1 99] The prolyl oligopeptidase family [1 ,2,3] consist of a number of evolutionary related peptidases whose catalytic 
activity seems to be provided by a charge relay system similar to that of the trypsin family of serine proteases, but 
30 which evolved by independent convergent evolution. The known members of this family are listed below. 

Prolyl endopeptidase (EC 3.4.21 .26) (PE) (also called post-proline cleaving enzyme). PE is an enzyme that cleaves 
peptide bonds on the C-terminal side of prolyl residues. The sequence of PE has been obtained from a mammalian 
species (pig) and from bacteria (Flavobacterium meningosepticum and Aeromonas hydrophila); there is a high 
35 degree of sequence conservation between these sequences. 

Escherichia coli protease II (EC 3.4.21 .83) (oligopeptidase B) (gene prtB) which cleaves peptide bonds on the C- 
terminal side of lysyl and argininyl residues. 

DIpeptidyl peptidase IV (EC 3.4.14.5) (DPP IV). DPP IV is an enzyme that removes N-terminal dipeptides sequen- 
tially from polypeptides having unsubstituted N-termini provided that the penultimate residue is proline. 
40 - Yeast vacuolar dipeptidyl aminopeptidase A (DPAP A) (gene: STE1 3) which is responsible for the proteolytic mat- 
uration of the alpha-factor precursor. 

Yeast vacuolar dipeptidyl aminopeptidase B (DPAP B) (gene: DAP2). 

Acylamino-acid-releasing enzyme (EC 3.4.19.1) (acyl-peptide hydrolase). This enzyme catalyzes the hydrolysis 
of the amino-terminal peptide bond of an N-acetylated protein to generate a N-acetylated amino acid and a protein 
45 with a free amino-terminus. 

[1200] A conserved serine residue has experimentally been shown (in E.coli proteasell as well as in pig and bacterial 
PE) to be necessary for the catalytic mechanism. This serine, which is part of the catalytic triad (Ser. His, Asp), is 
generally located about 150 residues away from the C-terminal extremity of these enzymes (which are all proteins that 
50 contains about 700 to 800 amino acids). 

[1201] Consensus pattem: D-x(3)-A-x(3)-[LIVMFYWl-x(14)-G-x-S-x-G-G-[LIVMFYW](2) [S is the active site residue] 
Sequences known to belong to this class detected by the pattem ALL, except for yeast DPAP A. 
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[1202] Note: these proteins belong to families S9A/S9B/S9C in the classification of peptidases [4]. 

[ 1] Rawlings N.D., Polgar L.. Ban-ett A.J. Biochem. J. 279:907-911(1991). 
[ 2] Ban^ett A.J., Rawlings N.D. 
[ 3] Polgar L., Szabo E. 

[4] Rawlings N.D., Ban-ett A.J. Meth. Enzymol. 244:19-61(1994). 
[1203] 445. (Rerin 4a) 

Z^n^'^:S^T:^T::^^s^se s.s DCOH (Omer^ation c^faCor of hepatocyte nuCear factor 

1-alpha). 

and transcriptional coactivator DCoH and its complex with a product analogue." Protem Sc. 1996.5.1963-1972. 
[1207] 446. (Pyridox oxidase) 

Sxinr^^;?tamin B6) aJd p;idoxa. pt,osphate. It oxidizes PV^^rrrtuifand^^^^^^^^^ 

pyridoxal-5-P. The sequences of the enzyme from bacterial (genes pdxH or fprA) [1] and fungal (gene PDX3) [2] sources 

«how that this Drotein has been highly conserved throughout evolution. ,^ n /^lo^ 

Pdxn Ts irolu'ona.^ related [3] to one of the enzymes in the phenazine biosynthesis protem pathway, phzD (^so 

Lol asXT^a signatufe pattern, a highly 

enzymes. 

. Consensus pattern: [LIVF]-E-F-W-[QHG]-x(4)-R-[LIVM]-H-[DNE]-R 

[ 1] Lam H.-M., Winkler M.E. J. Bacterid. 174:6033-6045(1992). ,„„,.pqs, 
2 Loubbardi A.. Karst R. Guilloton M.. Marcireau C. J. Bac^t"'^\ J wmos^ 
1 3] Pierson L.S. III. Gaffney T. Lam S.. Gong F. FEMS Microbiol. Lett. 134:299-307(1995). 

[1209] 447. (Pyrophosphatase) 

Inorganic pyrophosphatase signature responsible for the hydrolysis of pyro- 

?ZU?p1)whEst^^^^^^^^^^^ 

MSr?he s"e;3i^^^ regions of similari«es. As signature patterns a region was selected 

that contains three conserved aspartates that are involved In the binding of cations. 

. Consensus pattern: D-[SGDNl-D-pg-[LIVMF]-D-[UVMGACl 
[The three D's bind divalent metal cations] 

[ 1] Lahti R.. Kolakowski L.F. Jr., Heinonen J.. Vihinen M.. Pohjanoksa K., Coopemian B.S. Biochim. Biophys. Acta 

1038:338-345(1990). . ^ q • Ay-ofiO 0fifiri992^ 

[ 2] Cooperman B.S., Baykov A.A.. Lahti R. Trends Biochem. Sci. 17.262-266(1992). 

[1212] 448. (Peptidase S26) 

f^lt ''^StroVidaTesTsPases) [11 (aka leader peptidases) remove the signal peptides from secretory proteins. 

proU^es t'hSt^^^ rsPatllli known: ty^ ' (gene lepB) which is responsible for the processing of the 
r^TSortedp^^^^^^^^^ 
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protuding in the periplasmic space. Two residues have been shown [2.3] to be essential for the catalytic activity of 
SPase I: a serine and an lysine. SPase I is evolutionaiy related to the yeast mitochondrial inner membrane protease 
subunit 1 and 2 (genes IMP1 and IMP2) which catalyze the removal of signal peptides required for the targeting of 
proteins from the mitochondrial matrix, across the inner membrane, into the inter-membrane space [4]. In eukaryotes 

s the removal of signal peptides is effected by an oligomeric enzymatic complex composed of at least five subunits: the 
signal peptidase complex (SPC). The SPC is located in the endoplasmic reticulum membrane. Two components of 
mammalian SPC, the 18 Kd (SPC18) and the 21 Kd (SPC21) subunits as well as the yeast SEC11 subunit have been 
shown [5] to share regions of sequence similarity with prokaryotic SPases I and yeast IMP1/IMP2. Three signature 
patterns have been developed for these proteins. The first signature contains the putative active site serine, the second 

10 signature contains the putative active site lysine which is not conserved in the SPC subunits, and the third signature 
corresponds to a conserved region of unknown biological significance which is located in the C-terminal section of all 
these proteins. 

[1214] Consensus pattem: [GS]-x-S-M-x-[PS]-[AT]-[LF] [S is an active site residue]- 

Consensus pattem: K-R-[LIVMSTA](2)-G-x-[PG]-G-[DE]-x-[LIVM]-x-[LIVMFY] [K is an active site residue]- 

15 Consensus pattem: [LIVMFYW)(2)-x(2)-G-D-[NH]-x(3)-[SND]-x(2)-[SG]- 

[1215] [ 1] Dalbey R.E., von Heijne G. Trends Biochem. Sci. 17:474-478(1 992). [ 2] Sung M.. Dalbey R.E. J. Biol. 
Chem. 267:13154-13159(1992).[ 3] Black M.T J. Bacteriol. 1 75:4957-4961 (1993).[ 4] Nunnari J., Fox T.D., Walter P. 
Science 262:1 997-2004(1 993). [ 5] van DijI J.M., de Jong A.. Vehmaanpera J., Venema G.. Bron S. EMBO J. 11: 
281 9-2828(1 992).[ 6] Rawlings N.D.. Ban^ett A.J. Meth. Enzymol. 244:1 9-61 (1994).[E1] 

20 [1216] 449. (Peptidase CI ) Eukaryotic thiol (cysteine) proteases active sites. Eukaryotic thiol proteases (EC 3.4.22.-) 
[1] are a family of proteolytic enzymes which contain an active site cysteine. Catalysis proceeds through a thioester 
intermediate and is facilitated by a nearby histidine side chain; an asparagine completes the essential catalytic triad. 
The proteases which are currently known to belong to this family are listed below (references are only provided for 
recently determined sequences). - Vertebrate lysosomal cathepsins B (EC 3.4.22.1) , H (EC 3.4.22.16 ). L (EC 

25 3.4.22.15) . and S (EC 3.4.22.27 ) [2]. - Vertebrate lysosomal dipeptidyl peptidase I (EC 3.4.14.1) (also known as cathe- 
psin C) [2]. - Vertebrate calpains (EC 3.4.22. 17 ). Calpains are intracellular calcium- activated thiol protease that contain 
both a N-terminal catalytic domain and a C-terminal calcium-binding domain. - Mammalian cathepsin K, which seems 
involved in osteoclastic bone resorption [3]. - Human cathepsin O [4]. - Bleomycin hydrolase. An enzyme that catalyzes 
the inactivation of the antitumor drug BLM (a glycopeptide). - Plant enzymes: bariey aleurain (EC 3.4.22.16 ), EP-B 1 /B4; 

30 kidney bean EP-C1, rice bean SH-EP; kiwi fruit actinidin (EC 3.4.22.14 ); papaya latex papain (EC 3.4.22.2 ). chymo- 
papain (EC 3.4.22.6 ), caricain (EC 3.4.22.30 ). and proteinase IV (EC 3.4.22.25 ); pea turgor-responsive protein 15A; 
pineapple stem bromelain (EC 3.4.22.32) ; rape COT44; rice oryzain alpha, beta, and gamma; tomato low-temperature 
induced, Arabidopsis thaliana A494, RD19A and RD21 A. • House-dust mites allergens DerPI and EurMI . - Cathepsin 
B-like proteinases from the worms Caenorhabditis elegans (genes gcp-1 , cpr-3, cpr-4, cpr-5 and cpr-6). Schistosoma 

35 mansoni (antigen SM31 ) and Japonica (antigen SJ31 ), Haemonchus contortus (genes AC-1 and AC-2), and Ostertagia 
ostertagi (CP-1 and CP-3). - Slime mold cysteine proteinases CP1 and CP2. - Cruzipain from Trypanosoma cruzi and 
brucei. - Throphozoite cysteine proteinase (TCP) from various Plasmodium species. - Proteases from Leishmania 
mexicana, Theileria annulata and Theileria parva. - Baculoviruses cathepsin-like enzyme (v-cath). - Drosophila small 
optic lobes protein (gene sol), a neuronal protein that contains a calpain-like domain. - Yeast thiol protease 

40 BLH1/YCP1/I_AP3. - Caenorhabditis elegans hypothetical protein C06G4.2, a calpain-like protein. Two bacteria! pepti- 
dases are also part of this family: - Aminopeptidase C from Lactococcus lactis (gene pepC) [5]. - Thiol protease tpr 
from Porphyromonas gingivalis. Three other proteins are structurally related to this family, but may have lost their 
proteolytic activity. - Soybean oil body protein P34. This protein has its active site cysteine replaced by a glycine. - Rat 
testln. a Sertoli cell secretory protein highly similar to cathepsin L but with the active site cysteine is replaced by a 

45 serine. Rat testin should not be confused with mouse testin which is a LIM-domain protein (see <PDOC00382 >). - 
Plasmodium falciparum serine-repeat protein (SERA), the major blood stage antigen. This protein of 111 Kd possesses 
a C-terminal thiol-protease-like domain [6], but the active site cysteine is replaced by a serine. The sequences around 
the three active site residues are well conserved and can be used as signature pattems. 

[1217J Consensus pattern: Q-x(3)-[GEl-x-C-[YW]-x(2)-[STAGC]-[STAGCV] [C is the active site residue]- Note: the 
50 residue in position 4 of the pattern is almost always cysteine; the only exceptions are calpains (Leu), bleomycin hy- 
drolase (Ser) and yeast YCP1 (Ser). -Note: the residue in position 5 of the pattern is always Gly except in papaya 
protease IV where it is Glu. 

[1218] Consensus pattem: [LIVMGSTAN]-x-H.[GSACE]-[LIVM)-x-[LIVMAT](2)-G-x-[GSADNH] [H is the active site 
residue]- 

55 Consensus pattem: [FYCH]-IWI]-[LIVTl-x-[KRQAG]-N-[STl-W-x(3)-IFYWl.G-x(2)-G-tLFYW]-ILIVMFYGl-x-[LIVMF] [N 
is the active site residue] - Note: these proteins belong to family CI (papain-type) and C2 (calpains) in the classification 
of peptidases [7.E11.- 

[1219] [ 1] Dufour E. Biochimie 70:1335-1342(1988).[ 2] Kirschke H.. Barrett A.J., Rawlings N.D. Protein Prof. 2: 
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1587-1643(1995) [31 Shi G.-P.. Chapman H.A.. Bhairi S.M., Deleeuw C, Reddy V.Y., Weiss S J. FEES Lett. 357 
lof.l!^fio.w!.l vllloo G.. Ferrando A.A.. Puente X.S.. Sanchez L.M.. Lopez-Otin C. J. B.ol. Chem. 269. 
271 SelaVl 427l 994) [5] Chapot-Chartier M.P. Nardi M.. Chopin M.C.. Chopin A., Gripon JJS Appi. '^"^iron. ivio™ 
SSo 333(?993)1 6] Higgins D.G.. McConnell D.J.. Sharp PM. Nature 340:604-604(1 989).l 7] Rawhngs N.D., Barrett 
A.J.IVIeth.Enzymol. 244:461-486(1994). 

M9i>ni <i<in (Deotidase M24) Aminopeptidase P and proline dipeptidase signature (1). 

SL^tilirP EC 3 4 119) is the enzyme responsible for the release of any N-tertninal amino acid adjacent to a 
due'^^^^ 3'4.13.9) (prolidase) splits dipeptides with a prolyl resK.ue in the ca-to^l 

te?r!^a^SSon Bacterial aminopeptidasT?^^ 

terminal posiiiom ea^ are evolutionary related. These proteins are manganese metalloenzymes. Yeast hypo- 
Te^ifns^^^^^^^ Mycot^cterium tuberculosis hypothetical ^^'^^^^^^'^f'^^^^^^ 

I^n^to th" S^^^^^ a signature pattern for these enzymes a consented region that contains three h.st.d.ne resKlues 

MMTc^ntnsus pattern: [HAHGSYRHLlVMTl-ISGl-H-x-lLIV]-G-[LIVM]-x-[IV]-H-PEl- ..5.^939. 

222 11] Yoshimoto T.. Tone H.. Honda T.. Osatomi K.. Kobayashi R., Tsuru D. J. ^'oche'n^ 105^412-416(1989). 

21 Nakahigashi K Inokuchi H. Nucleic Acids Res. 1 8:6439-6439(1 990).[ 31 Endo F.. Tanoue A Nakai H-. Hata A 
Inl Y.!^Si K.. Matsuda I. J. Biol. Chem. 264:4476-M81(1989).[41 Rawlings N.D.. Barrett A.J. Meth. Enzymol. 248. 

t\'223r M?thLnine aminopeptidase signatures. (2). Methionine aminopeptidase (EC MJ1J£ 'J^^^^-^^^^^^^ 
or the removal of the amino-temiinal (Initiator) methionine from nascent eukaryotic cytosol.c and cy^oplasm^ prokan^^ 
ot c proteins if the penultimate amino acid is small and uncharged. All MAP studied »° J^^^^^. 
that reauire cobalt ions for activity. Two subfamilies of MAP enzymes are known to exist [1 ,21. While bemg evolutionary 
reSted liy on^^share a limited amount of sequence similarity mostly clustered around 

Srtfiid^^ con MAP [31 to be involved in cobalt-binding. The first family consists of enzymes from prokaryotes as weM 
StlSSicMAP 1 wh te the second group is made up of archebacterial MAP and eukaryoticMAP-2. The second 

riSss^n^tr=l=l— ^^^^^ 

Consinsus pa«em: ,DAHUvWx-K-[LIVM]-D-x-G-x-lHQl-[L.VMHDNSl-G-^ Fhe second D and the last D/ 

TlSTll S sl.. Kendall R.L., Hall L., Weaver L.H.. Stewart A.E.. Ma«hews B.W. B-dshaw ^^^^^^^^^^^^ 
Arad Sc USA 92-7714-7718(1995).l 21 Keeling PJ.. Doolittle W.F. Trends Biochem. Sa. 21.285-286(1996)^131 
R^erii s.^ Mathews B^W. Biochemlitr^ 32:3907-3912(1 993).[ 4] Rawlings N.D.. Barrett A.J. Meth. Enzymol. 248. 
183-228(1995). 

ri226l 451 Cvtochrome P450 cysteine heme-iron ligand signature , ^. ^ ^ f„,»..roi 

Srome pSi 2 3 Ml are a groupof enzymesinvolved in theoxidative metabolism of a high number 

Sn"o"nds (suS as ste^. fatty acids' prostaglandins, leukotrienes. etc) as well as drugs «=a--f ^ -"Jj 
S^ns Based on sequence similarities. P450-S have been classified into about forty driTerent fami es i^f^-^^^^^^'^ 
Sns of 400 to 530 amino acids; the only exception is Bacillus BM-3 (CYP102) which is a protein of 1048res^es 
mTrnte ns a N-temiinal P450 domain followed by a reductase domain. P450's are heme protems A cons«ved 
^Ste'e^^wSe m L C-terminal part of P450's is involved in binding the heme i^n in the fifth coordination site. From 
a reaion around this residue, a ten residue signature was developed specific to P450 S. 

ConSnsus pattern; [FWl-[SGNH]-x-tGD]-x-[RHPTl-x-C-[LIVMFAP]-[GADl [C is the heme iron hgand]- 

r 11 Nebert D W Gonzalez F.J. Annu. Rev. Biochem. 56:945-993(1987). 

2 Coon M.J., Ding X.. Pemecky S.J., Vaz A.D.N. FASEB J. 6:669-673(1992). 
r 31 Guenaerich FP J. Biol. Chem. 266:10019-10022(1991). 

4 Neion D R., Kamataki T. Waxman D.J.. Guengerich F.P. Estrab^ok R^-- ^«y-e-«" « • 
Coon M J Gunsalus I.C., Gotoh O., Okuda K., Nebert D.W. DNA Cell Biol. 12:1-51(1993). 
1 51 Degtyarenko K.N., Archakov A.I. FEBS Lett. 332:1-8(1993). 

rotting of plant tissue. 

[1229] [1] Yoder MD, Keen NT, Jurnak F, Science 1993;260:1503-1507. 

ri2301 453 (Deo M24) Aminopeptidase P and proline dipeptidase signature (pepi) 

Sl^ZeJSase P (EC 3.4.11.9) Is the enzyme responsible for the release of any N-terminal am.no acd adjacent to a 
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proline residue. Proline dipeptidase(EC 3>4.13.9 ) (prolidase) splits dipeptides with a prolyl residue in the carboxyl 
terminal position. Bacterial aminopeptidase P It (gene pepP) [1], proline dipeptidase (gene pepQ)[2]. and human proline 
dipeptidase (gene PEPD) [3] are evolutionary related. These proteins are manganese metalloenzymes. Yeast hypo- 
thetical proteins YER078c and YFR006w and Mycobacterium tuberculosis .hypothetical protein MtCY49.29c also be- 
5 long to this family. As a signature pattern for these enzymes a conserved region was selected that contains three 
histidine residues. 

[1231] Consensus pattern: [HA]-[GSYR]-[LIVMT|-[SG]-H-x-[LIVl-G-[LIVM]-x-[IV]-H-[DE]- 

[ 1] Yoshimoto T., Tone H., Honda T. Osatomi K., Kobayashi R., Tsuru D. J. Biochem. 105:412-416(1989). 
10 I 2] Nakahigashi K.. Inokuchi H. Nucleic Acids Res. 18:6439-6439(1990). 

[ 3] Endo R. Tanoue A.. Nakai H.. Hata A.. Indo Y. Titani K.. Matsuda I. J. Biol. Chem. 264:4476-4481(1989). 
[4] Rawlings N.D.. Ban-ett A.J. Meth. Enzymol. 248:183-228(1995). 

[1232] Methionine aminopeptidase signatures (pep2) 

15 Methionine aminopeptidase (EC 3.4.11.18 ) (MAP) is responsible for the removal of the amino-terminal (initiator) me- 
thionine from nascent eukaryotic cytosolic and cytoplasmic prokaryotic proteins if the penultimate amino acid is small 
and uncharged. All MAP studied to date are monomeric proteins that require cobalt ions for activity. Two subfamilies 
of MAP enzymes are known to exist [1.2]. While being evolutionary related, they only share a limited amount of se- 
quence similarity mostly clustered around the residues shown, in the Escherichia coli MAP [3],to be involved in cobalt- 

20 binding. The first family consists of enzymes from prokaryotes as well as eukaryotic MAP-1 , while the second group 
is made up of archebacterial MAP and eukaryotic MAP-2. The second subfamily also includes proteins which do not 
seem to be MAP, but that are clearly evolutionary related such as mouse proliferation-associated protein 1 and fission 
yeast curved DNA-binding protein. For each of these subfamilies, a specific signature pattern was developed that 
includes residues known to be involved in colbalt-binding. 

25 [1233] Consensus pattern: [MFY]-x-G-H-G-[LIVMC]-(GSH]-x(3)-H-x(4)-[LIVM]-x-[HN]-[YWV] [H is a cobalt ligand]- 
Consensus pattern: [DA]-[LIVMYl-x-K-[LIVM]-D-x-G-x-[HQ]-[LIVM]-[DNS]-G-x(3)-[DN] [The second D and the last D/ 
N are cobalt ligands] 

[ 1] Arfin S.M., Kendall R.L.. Hall L.. Weaver L.H., Stewart A.E,. Matthews B.W.. Bradshaw R.A. Proc. Natl. Acad. 
30 Sci. U.S.A. 92:7714-7718(1995). 

[ 2] Keeling PJ.. Doolittle W.F. Trends Biochem. Sci. 21:285-286(1996). 
[ 3] Roderick S.L.. Mathews B.W. Biochemistry 32:3907-3912(1993). 
[4] Rawlings N.D., Barrett A.J. Meth. Enzymol. 248:183-228(1995). 

35 [1234] 454. Peroxidases signatures 

Peroxidases (EC 1 .11 .1 .-) [1] are heme-binding enzymes that carry out a variety of biosynthetic and degradative func- 
tions using hydrogen peroxide as the electron acceptor. Peroxidases are widely distributed throughout bacteria, fungi, 
plants, and vertebrates. In peroxidases the heme prosthetic group is protoporphyrin IX and the fifth ligand of the heme 
iron is a histidine (known as the proximal histidine). Another histidine residue (the distal histidine) serves as an acid- 

40 base catalyst in the reaction between hydrogen peroxide and the enzyme. The regions around these two active site 
residues are more or less conserved in a majority of peroxidases [2.3]. The enzymes in which one or both of these 
regions can be found are listed below. - Yeast cytochrome c peroxidase (EC 1.11.1.5 ). - Myeloperoxidase (EC 1.11.1.7) 
(MPO). MPO is found in granulocytes and monocytes and plays a major role in the oxygen-dependent microbicidal 
system of neutrophils. - Lactoperoxidase (EC 1.11.1.7) (LPO). LPO is a milk protein which acts as an antimicrobial 

45 agent. - Eosinophil peroxidase (EC 1.11.1.7) (EPO). An enzyme found in the cytoplasmic granules of eosinophils. - 
Thyroid peroxidase (EC 1.11.1.8 ) (TPO). TPO plays a central role in the biosynthesis of thyroid hormones. It catalyzes 
the iodination and coupling of the hormonogenic tyrosines in thyroglobulin to yield the thyroid hormones T3 and T4. - 
Fungal ligninases. Ligninase catalyzes the first step in the degradation of lignin. It depolymerizes lignin by catalyzing 
the C(alpha)-C(beta) cleavage of the propyl side chains of lignin. - Plant peroxidases (EC 1.11.1.7) . Plants expresses 

50 a targe numbers of isozymes of peroxidases. Some of them play a role in cell-suberization by catalyzing the deposition 
of the aromatic residues of suberin on the cell wall, some are expressed as a defense response toward wounding, 
others are involved in the metabolism of auxin and the biosynthesis of lignin. - Prokaryotic catalase-peroxidases. Some 
bacterial species produce enzymes that exhibit both catalase and broad-spectrum peroxidase activities [4]. Examples 
of such enzymes are: catalase HP I from Escherichia coli (gene katG) and perA from Bacillus stearothermophilus. 

55 [1235] Consensus pattem: [DET]-[LIVMTA}-x(2HLIVM)-[LIVMSTAG]-[SAG]-[LIVMSTAG J-H- [STA]-[LIVMFY] [H is 
the proximal heme-binding ligand] - 

Consensus pattem: [SGATV]-x(3)-[LIVMAl-R-[LIVMA]-x-[FW]-H-x-[SAC] [H is an active site residue]- 
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[ 1] Dawson J.H. Science 240:433-439(1988). 

r oi Lrir»..rs C IL-oHa-.Qaitn M Pmteins 3:113-120(1988). 

[ 3] Henriiat BTsaloheimo M.. Lavaitte S.. Knowles J.K.C. Proteins 8:251-257(1990). 
[ 4] Welinder K.G. Biochim. Blophys. Acta 1080:215-220(1991). 

[12361 455. pfkB family of carbohydrate kinases signatures , » j „„w h= 

t has been shown [1 2.3] that the following carbohydrate and purine kinasesare evolutionary related and can be 
grSupedTnto a smgli family, which isknown [1] as the 'pfkB family": - Fructolc.nase (EC 2J^A) ^--c^^^ 
Jhofmctokinase isozyme 2 (EC 2JJJD (phosphofructokinase-2) (gene pfkB). pfl<B .s a ™ ^^^^^ 
Lzyme in Escherichia coli and is not evolutionary related to the major isozyme (gene P^^^^-^^'^^J J^P'^X j^^^^^^ 
tokinase also belong to this family. - Ribokinase (EC 2.7.1.15) (gene rbsK). - 

ADK) - 2-dehydro-3-deoxygluconokinase (EC 2.7.1.45) (gene: kdgK). - 1-phosphofructokinase (EC 2Zi5§) (fructose 
CpSsphate kinase) (genefruK). - Inosine-guanosine «nase(EC2X1J3) (genegsk). -Tagatose-6-phosp^^^^ 
(EC 2 7 1.144 ) (phosphotagatokinase) (gene lacC). - Escherichia coli hypothetical protein yeiC. - Eschenchia «rt hy- 
pomiwSe n yeil - Escherichia coli hypothetical protein yhfQ. - Escherichia coli hypothet«al P^^^^ V^V^-^^""^ 
subtilis hy Jothetk^l protein yxdC. - Yeast hypothetical protein YJR1 05W.A1I the above kinases are P™t«'"%°: ^80 
to 430 amino acid residues that share a few region of sequence similarity. Two of these regions were s^^jej as 
signature patterns. The first pattem is based on a region rich in glycine which is located in the N-temiinal section of 
these enzymes; while the second pattem is based on a conserved region in the C-terminal section. 
[12371 Consensus pattem: [AG]-G-x(0,1)-[GAP]-x-N-x-lSTA]-x(6)-tGSl-x(9)-G- „„.^^,„ 
bonsensus pattem: lDNSK]-[PSTVl-x-lSAG](2)-[GD]-D-x(3HSAGV]-[AG]-[LIVMFYA]-[L.VM^^ 

[ 11 Wu L -F Reizer A.. Reizer J., Cai B.. Tomich J.M.. Saier M.H. Jr. J. Bacteriol. 173:3117-3127(1991). 
[2] Orchard 'l.M.D.. Kornberg H.L. Proc. R. Soc. Lend.. B, Biol. Sci. 242:87-90(1990). 
[ 3] Blatch G.L., Scholle R.R., Woods D.R. Gene 95:17-23(1990). 

[12381 456. Phospholipase A2 active sites signatures j . 

Phospholipase A2 (EC 3.1.1.4) (PA2) [1.2] is an enzyme which releases fatty acids from the second ^^^^o" group of 

glycerol. PA2-S are sma-ii^rigid proteins of 120 amino-acid residues that have four to seven disulfide bondsPA2 

Esa calcium ion Which is required for activrty. The Side chains of two conserved residue^ 

acid, participate in a 'catalytic network'. Many PA2's have been sequenced from snakes Jizards 

in the latter, there are at least four fomis: pancreatic, membrane-associated as well as two less d^aractenzed foirns 

Sie venom of most snakes contains multiple fom,s of PA2. Some of them are P^-«y"^P''^."^"«' 

neuromuscular transmission by blocking acetylcholine release from the nerve tern, ni. Two different signatu e patterns 

were derived for PA2's. The first is centered on the active Site histidineand contains threecysteinesmvolved in M 

bonds. The second is centered on the active site aspartic acid and also contains three cysteines involved in disulfide 

mSS] Consensus pattern: C-C-x(2)-H-x(2)-C [H is the active site residue] This pattem will not detect some snake 
ShomologouswnhPA2 but which have lost their catalytic activity as well as otoconin-22.aXenopus protein from 

the aragonitic otoconia which is also unlikely to be enzymatically active. ,ft.„rti„„»i 
consensus pattem: [LlVMA]-C-{LlVMFYWPCST>C-D-x(5)-C [D is the active sje ^f'^^ll^'^Z2i6pZ^t 
and non-functional PA2's. Undetected sequences are bee PA2, gila monster PA2's. PA2 PL-X from habu and PA2 PA 
5 from mulga. 

[1] Davidson F.F.. Dennis E.A. J. M01.EVOI. 31:228-238(1990). vuin^nH i rhrte 

1 2] Gomez F.. Vandermeers A.. V&ndermeers-Piret M.-C. Herzog R.. Rathe J.. Stievenart M.. Winand J.. Chris- 
tophe J. Eur. J. Biochem. 186:23-33(1989). 

[12401 457 Phosphorylase pyridoxal-phosphate attachment site. Phosphorylases (EC 2AAJ) [1] are Imfwrtant al- 
Sc eSmes in cartohydrate metabolism. They catalyze the fomiation of glucose 1-phosphatefrom polyglucose 
L^S as g^gen starch o^ maltodextrin. Enzymes from different sources differ in their regulatory -echan.sms and 

meir natu^I substrates. However, all known phosphorylases share catalytic and sfructural P'^^^^^^^ J^^J^^^ 
rxLl-phosphate dependent enzymes;thepyridoxal-P group is attachedtoalysine residue around which the 

is highly conserved and can be used as a signature pattem to detect this class of enzymes. 

[1241] Consensus pattem: E-A-[SC]-G-x-[GS]-x-M-K-x(2)-ILM]-N [K is the pyndoxal-P attachment siteh 

[ 1] Fukui T. Shimomura S.. Nakano K. Mol. Cell. Biochem. 42:129-144(1982). 

[12421 458. Protein kinases signatures and profile 

Eukaiolic protein kinases [1 to 5] aro enzymes that belong to a very extensive family of proteins which share a con- 
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served catalytic core common toboth serine/threonine and tyrosine protein kinases. There are a number ofconserved 
regions in the catalytic domain of protein kinases. Two of these regions were selected to build signature patterns. The 
first region, which is located in the N-terminal extremity of the catalytic domain, is a glycine-rich stretch of residues in 
the vicinity of a lysine residue, which has been shown to be involved in ATP binding. The second region, which is 
5 located in the central part of the catalytic domain, contains a conserved aspartic acid residue which is important for 
the catalytic activity of the enzyme [6]; Two signature patterns were derived for that region: one specific for serine/ 
threonine kinases and the other for tyrosine kinases. A profile was also developed which is based on the alignment in 
[1] and covers the entire catalytic domain. 

[1243] Consensus pattern: [UV]-G-{P}-G-{P}-[FYWMGSTNH]-[SGA]-{P\A^[LIVCATI-{PD}-x- [GSTACLIVMFY]-x 
10 (5,18)-[LIVMFYWCSTAR]-[AIVP]-[LIVMFAGCKR)-K [K binds ATP]. The majority of known protein kinases belong to 
the class detected by this pattem. but it fails to find a number of them, especially viral kinases which are quite divergent 
in this region and are completely missed by this pattem. 

Consensus pattem: [LIVMFYC]-x-[HY]-x-D-[LIVMFY]-K-x(2)-N-[LIVMFYCT|(3) [D is an active site residue]. Most ser- 
ine/ threonine specific protein kinases belong to this class detected by the pattern with 10 exceptions (half of them viral 

15 kinases) and also Epstein-Barr virus BGLF4 and Drosophila ninaC which have respectively Ser and Arg instead of the 
conserved Lys and which are therefore detected by the tyrosine kinase specific pattern described below. 
[1244] Consensus pattem: [LIVMFYC]-x-[HY]-x-D-[LlVMFY]-[RSTAC]-x(2)-N-[LIVMFYC](3) [D is an active site res- 
idue] ALL tyrosine specific protein kinases with the exception of human ERBB3 and mouse bik belong to this class 
detected by the pattem. This pattem will also detect most bacterial aminoglycoside phosphotransferases [8,9] and 

20 herpesviruses gangciclovir kinases [10]; which are proteins structurally and evolutionary related to protein kinases. 
This profile also detects receptor guanylate cyclases and 2-5A-dependent ri bo nucleases. Sequence similarities be- 
tween these two families and the eukaryotic protein kinase family have been noticed before. It also detects Arabidopsis 
thaliana kinase- like protein TMKL1 which seems to have lost its catalytic activity. If a protein analyzed includes the 
two protein kinase signatures, the probability of it being a protein kinase is close to 100%. Eukaryotic-type protein 

25 kinases have also been found in prokaryotes such as Myxococcus xanthus [11] and Yersinia pseudotuberculosis. 

[ 1] Hanks S.K.. Hunter T FASEB J. 9:576-596(1995). 
[2] Hunter T. Meth. Enzymol. 200:3-37(1991). 
[ 3] Hanks S.K., Quinn A.M. Meth. Enzymol. 200:38-62(1991). 
30 [ 4] Hanks S.K. Cun-, Opin. Struct. Biol. 1:369-383(1991). 

[ 5] Hanks S.K.. Quinn A.M., Hunter T Science 241:42-52(1988). 

[ 6] Knighton D.R., Zheng J., Ten Eyck L.F., Ashford V.A., Xuong N.-H., Taylor S.S., Sowadski J.M. Science 253: 
407-414(1991). 

[ 7] Bairoch A.. Claverie J.-M. Nature 331:22(1988). 
35 [ 8] Benner S. Nature 329:21-21 (1 987). 

[ 9] Kirby R. J. Mol. Evol. 30:489-492(1992). 

[10] Littler E., Stuart A.D., Chee M.S. Nature 358:160-162(1992). 

[11] Munoz-Dorado J.. Inouye S.. Inouye M. Cell 67:995-1006(1991) . 

40 [1245] Receptor tyrosine kinase class II signature 

A number of growth factors stimulate mitogenesis by interacting with a familyof cell surface receptors which possess 
an intrinsic, ligand-sensitive. protein tyrosine kinase activity [1]. These receptor tyrosine kinases (RTK)all share the 
same topology: an extracellular ligand-binding domain, a single transmembrane region and a cytoplasmic kinase do- 
main. However they can be classified into at least five groups. The prototype for class II RTK*s is the insulin receptor, 

45 a heterotetramer of two alpha and two beta chains linked by disulfide bonds. The alpha and beta chains are cleavage 
products of a precursor molecule. The alpha chain contains the ligand binding site, the beta chain transverses the 
membrane and contains the tyrosine protein kinase domain. The receptors currently known to belong to class II are: 

- Insulin receptor from vertebrates. - Insulin growth factor I receptor from mammals. - Insulin receptor-related receptor 
(IRR). which is most probably a receptor for a peptide belonging to the insulin family. - Insects insulin-like receptors. - 

50 Molluscan insulin-related peptide(s) receptor (MIP-R). - Insulin-like peptide receptor from Branchiostoma lanceolatum. 

- The Drosophila developmental protein sevenless, a putative receptor for positional information required for the for- 
mation of the R7 photoreceptor cells. - The trk family of receptors (NTRK1 , NTRK2 and NTRK3), which are high affinity 
receptors for nerve growth factor and related neurotrophic factors (BDNF and NT-3).And the following uncharacterized 
receptors: - ROS. - LTK (TYK1). - EDDR1 (cak, TRKE. RTK6). - NTRK3 (Tyrol 0, TKT). - A sponge putative receptor 

55 tyrosine kinase. While only the insulin and the insulin growth factor I receptors are known to exist in the tetrameric 
conformation specific to class II RTK*s, all the above proteins share extensive homologies in their kinase domain, 
especially around the putative site of autophosphorylation. Hence, a signature pattem was developed for this class of 
RTK's. which includes the tyrosine residue, itself probably autophosphorylated. 
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[1246] Consensus pattern: [DN]-[L!V]-Y-x(3)-Y-Y-R [The second Y is the autophosphorylation site] 

.^Mo. r 41 \y l._ \/ I m-i^W A Annti Rirtrhcam S7'4^3-47ftf 1 P88V 

H^ttj i ij laiucii wit»iv*«t « 

[1248] Receptor tyrosine kinase Class III signature 

A number of growth factors stimulate mitogenesis by interacting with a family of cell surface receptors which possess 
an intrinsic ligand-sensitive, protein tyrosine kinase activity [1]. These receptor tyrosine kinases {RTK)all share the 
same topology an extracellular llgand-binding domain, a single transmembrane region and a cytoplasmic kinase do- 
main However they can be classified into at least five groups. The class III RTICs are characterized by the presence 
of five to seven immunoglobulin-like domains [2] in their extracellular section. Their kinase domain differs from that of 
other RTK's by the insertion of a stretch of 70 to 100 hydrophilic residues in the middle ofthis domain. The receptors 
currently known to belong to class III are: - Platelet-derived growth factor receptor (PDGF-R). PDGF-R exists as a 
homo- or heterodimer of two related chains: alpha and beta [3]. - Macrophage colony stimulating factor receptor (CSF- 
1-R) (also known as the fms oncogene). - Stem cell factor (mast cell growth factor) receptor (also known as the kit 
oncogene) - Vascular endothelial growth factor (VEGF) receptors Flt-1 and Flk-1/KDR [4]. - Fl cytokine receptor Flk- 
2/Flt-3 [5]. - The putative receptor Flt-4 [7]- a signature pattern V\feis developed for this class of RTK's which is based 
on a conserved region in the kinase domain. 
[1249] Consensus pattem: G-x-H-x-N-[LIVM]-V-N-L-L-G-A-C-T- 

[ 1] Yarden Y., Ullrich A. Annu. Rev. Biochem. 57:443-478(1988). 
[2] HunkapillerT, Hood L. Adv. Immunol. 44:1-63(1989). 

[ 3] Lee K -H Bowen-Pope D.F.. Reed R.R. Mol. Cell. Biol. 10:2237-2246(1990). 

[4] Terman B.I., Dougher-Vemiazen M.. Carrion M.E.. Dimitrov D.. Armellino D.C.. Gospodarowicz D.. Boehlen 

P Biochem. Biophys. Res. Commun. 187:1579-1586(1992). „ 

[ 5] Lyman S.D., James L.. Vanden BosT, de Vries R, Brasel K., Gliniak B., Hollingsworth L.T, Picha K.S., McKenna 

H J.. SplettR.R. Cell 75:1157-1167(1993) . . ^ „. ^ 

[6] Galland R. Karamysheva A., Pebusque M.J., Borg J.P.. Rottapel R.. Dubreuil R, Rosnet O.. Bimbaum D. 

Oncogene 8:1233-1240(1993). 

[1 250] Receptor tyrosine kinase dass V signatures 

A number of growth factors stimulate mitogenesis by interacting with a familyof cell surface receptors which possess 
an intrinsic, ligand-sensitive, protein tyrosine kinase activity [1]. These receptor tyrosine kinases (RTK)all share the 
same topology: an extracellular llgand-binding domain, a single transmembrane region and a cytoplasmid kinase do- 
main However they can be classified into at least five groups on the basis of sequence similarities. The extracellular 
domain of class V RTK's consist of a region of about SOOamino acids, amongst which 1 6 conserved cysteines probably 
involved in disulfide bonds: this region is followed by two copies of a fibronectin typelll domain. The ligands for these 
receptors are proteins of about 200 to 300residues collectively known as Ephrins. The receptors cun-ently known to 
belong to class V are [2.3.E1J: - EPHA1 (Eph-I; Esk). - EPHA2 (Eck; Mpk-5: Sek-2). - EPHA3 (E*--;- "^k; Mek4; 
Tyro4- Rek4: Cek4). - EPHA4 (Sek; HekS; Mpk-3; Cek8). - EPHA5 (Ehk-1: Hek7: Bsk; Cek7). - EPHA6 (Ehk-2). - 
EPHA7 (Ehk-3; Hekll; Mdk-1; Ebk). - EPHA8 (Eek). - EPHB1 (Eph-2; Elk; Net). - EPHB2 (Eph-3; HekS; Drt; Eric; Nuk; 
Sek-3- Cek5- Qek5) - EPHB3 (Hek-2; Mdk-5). - EPHB4 (Htk; Mdk-2; Myk-1). - EPHB5 (Cek9).The EPHA subtype 
receptore bind to GPI-anchored ephrins while the EPHB subtype receptors bind to type-1 membrane ephnns Two 
signature patterns were developed for this class of RTK's, which each include some of the «'"s«'Xt'J.T>!f' "fJ^rtloi 
[1251] Consensus pattem: F-x-[DN]-x-[GAW]-[GAl-C-[LIVM]-[SA]-[LIVM](2HSA]-[LV]-[KRHQl-ILIVA)-x(3)-[KR]-C- 

[PSAWJ [The two C's are probably involved in disulfide bonds] 

Consensus pattern: C-x(2HDEl-G-[DEQ]-W-x(2,3)-PAQ]-[LIVMTl-[GThx-C-x-C- x(2)-G-[HFY]-[EQ] [The three C s are 
probably involved in disulfide t>onds] 

[ 1] Yarden Y., Ullrich A. Annu. Rev. Biochem. 57:443-478(1988). 

[ 2] Saiiadi RG., Pasquale E.B., Subramani S. New Biol. 3:769-778(1991). 

[ 3] Wicks I.R. Wilkinson D., Salvaris E.. Boyd A.W. Proc. Natl. Acad. Sci. KJ.SA. 89:1611-1615(1992). 
[1252] 459. Protein kinase C terminal domain 

[1253] 460. Plant thionins signature . . • „ . ..i,„ 

Thionins are small, basic, plant proteins generally toxic to animal cells [IJ.They seem to exert their toxic effect at the 
level of the cell membrane but their exact function is not known. They consist of a polypeptide chain of forty five to fitty 
amino acids with three to four internal disulfide bonds. They are found in seeds but also in the cell wall of leaves [2]. 
Thionins are processed from larger precursor proteins [3]. Crambin [4], a hydrophobic plant seed protein, also belongs 
to this family. The pattem to detect this family of proteins includes three of the six cysteine residues involved in disulfide 
bonds. + +|-« 
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xxCCxxxxxxxxxxxCxxxxxxxxxCxxxCxxCxxxxxCxxxxxxxx *******«******|||+ +«c': conserved cysteine involved in 

a disulfide bond.'*': position of the pattern. 

[1254] Consensus pattern: C-C-x(5)-R-x(2)-[FY]-x(2)-C [The three C's are involved in disulfide bonds] The proteins 
from the gamma-thionin family are not related to the above proteins and are described in a separate section. 

5 

[ 1] Vernon L.P., Evett G.E.. Zeikus R.D., Gray W.R. Arch. Biochem. Biophys. 238:18-29(1985). 

[ 2] Bohlmann H.. Clausen S., Behnke S., Giese H., Hiller C, Reimann-Phillip U., Schrader G., Barkholt V.. Ape! 

K. EMBO J. 7:1559-1565(1988). 

I 3] Bohlmann H.. Apel K. Mol. Gen. Genet. 207:446-454(1987). 
10 1 4] Teeter M.M., Mazer J.A.. Litalien J.J. Biochemistry 20:5437-5443(1 981 ). 

[1255] 461. Polyprenyl synthetases signatures 

A variety of isoprenoid compounds are synthesized by various organisms. For example in eukaryotes the isoprenotd 
biosynthetic pathway is responsible for the synthesis of a variety of end products including cholesterol, dolichol, ubiq- 

15 uinone or coenzyme Q. In bacteria this pathway leads to the synthesis of isopentenyl tRNA, isoprenoid quinones, and 
sugar carrier lipids. Among the enzymes that participate in that pathway, are a number of polyprenyl synthetase en- 
zymes which catalyze a 1'4-condensation between 5 carbon isoprene units. Currently the sequence of some of these 
enzymes is known: - Eukaryotic famesyl pyrophosphate synthetase (FPP synthetase) (EC 2.5.1.1 / EC 2.5.1.10) which 
catalyzes the sequential condensation of isopentenyl pyrophosphate (IPP) with dimethylallyl pyrophosphate (DMAPP), 

20 and then with the resultant geranyl pyrophosphate to form famesyl pyrophosphate. FPP synthetase is a cytoplasmic 
dimeric enzyme. - Prokaryotic famesyl pyrophosphate synthetase (gene ispA). - Prokaryotic octaprenyl diphosphate 
synthase (gene ispB). - Prokaryotic heptaprenyl diphosphate synthase (EC 2.5.1.30 ). - Eukaryotic geranylgeranyl py- 
rophosphate synthetase (GGPP synthetase) (EC 2.5.1.1 / EC 2.5.1.10 / EC 2.5.1.29 ) which catalyzes the sequential 
addition of the three molecules of IPP onto DMAPP to form geranylgeranyl pyrophosphate. In plants GGPP synthase 

25 is a chloroplast enzyme involved in the biosynthesis of terpenoids; in fungi, such as Neurospora crassa (gene al-3), 
this enzyme is involved in the biosynthesis of carotenoids. - Prokaryotic GGPP synthetase, which are involved in the 
biosynthesis of carotenoids (gene crtE). Such an enzyme is also encoded in the cyanelle genome of Cyanophora 
paradoxa. - Eukaryotic hexaprenyl pyrophosphate synthetase, which is involved in the biosynthesis of coenzyme Q 
and which catalyzes the formation of all trans- polyprenyl pyrophosphates generally ranging in length of between 6 

30 and 10 isoprene units depending on the species. HP synthetase is a mitochondrial membrane-associated enzyme. It 
has been shown [1 to 5] that all the above enzymes share some regions of sequence similarity. Two of these regions 
are rich in aspartic-acid residues and could be involved in the catalytic mechanism and/or the binding of the substrates, 
signature patterns were developed for both regions. Possible additional members of this family of proteins are: - Bacillus 
subtilis spore germination protein C3 (gene gerC3). Both proteins are most probably also enzymes involved in isopre- 

35 noid metabolism [6]. 

[1256] Consensus pattern: [LIVM](2)-x-D-D-x(2,4)-D-x(4)-R-R-[GH]- 
Consensus pattern: [LIVMFY]-G-x(2)-[FYL]-Q-[LIVM]-x-D-D-[LIVMFY]-x-(DNG] 

[ 1] Ashby M.N.. Edwards P.A. J. Biol. Chem. 265:13157-13164(1990). 
<0 I 2] Fujisaki S., Kara H., Nishimura Y., Horiuchi K.. Nishino T. J. Biochem. 108:995-1000(1990). 

[ 3] Carattoli A.. Romano N.. Ballario P.. Morelli G.. Macino G. J. Biol. Chem. 266:5854-5859(1991). 

[ 4] Kuntz M., Roemer S., Suire C, Hugueney P., Weil J.H.. Schantz R.. Camera B. Plant J. 2:25-34(1992). 

[ 5] Math S.K., Hearst J.E.. Poulter CD. Proc. Natl. Acad. Sci. U.S.A. 89:6761-6764(1992). 

[ 6] Bairoch A. Unpublished observations (1993). 

45 

[1257] 462. Potato inhibitor I family signature 

The potato inhibitor I family is one of the numerous families of serine proteinase inhibitors. Members of this protein 
family are found in plants; in the seeds of barley or beans [1 ,2,3], and in potato or tomato leaves where they accumulate 
in response to mechanical damage [4,5]. An inhibitor belonging to this family is also found in leech [6]. It is interesting 
50 to note that, currently, this is the only proteinase inhibitor family to be found both inplant and animal kingdoms. Struc- 
turally these inhibitors are small (60 to 90 residues) and in contrast with other families of protease inhibitors, they lack 
disulfide bonds. They have a single inhibitory site. The consensus pattern includes three out of the four residues con- 
served in all members of this family and is located in the N-terminal half. 

Consensus pattern: [FYW)-P-[EQH]-[LIV)(2)-G-x(2)-[STAGV]-x(2)-A- Bariey subtilisin-chymotrypsin inhibitor-2b has 
55 Glu instead of Gly. There is a trypsin inhibitor from the cucurbitaceae Momordica charantia [7], which is said to belong 
to the potato inhibitor I family but which shows only a very weak similarity with the other members of this family. 

[ 1] Svendsen 1., Hejgaard J.. Chavan J.K. Carisberg Res. Commun. 49:493-502(1984). 
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[ 21 Svendsen I., Boisen S., Hejgaard J. Carisberg Res. Commun. 47:45-53(1982). 

: oi ui vor^o^oto u Ajzr^nn Y Ynshikawa M.. Iwasaki T J. Biochem. 106:1003-1008(1989). 

1 5] Lee J S.. Brown W.E. Graham J.S.. Pearce G.. Fox E.A.. Dreher T.W.. Ahem K.G., Pearson G.D.. Ryan C.A. 
Proc. Natl. Acad. Sci. U.S.A. 83:7277-7281(1986). .^«MQRm 
[ 6] Seemuller U., Eulitz M.. Fritz H., StrobI A. Hoppe-Seyler's Z. Physiol. Cham. 361 :1841-1846(1980). 
[ 7] Zeng F-Y., Qian R.-Q.. Wang Y. FEBS Lett. 234:35-38(1988). 

ri 2581 463. (pp binding) Phosphopantethelne attachment site , , . , 

Phosphopantetheine (or pantetheine 4'phosphate) is the prosthetic group of acyl carrier proteins (A^P) m somemul- 
tienzyme complexes where it serves as a 'swinging arm' for the attachment of actvated fatty ac.d and am.no^c d 
grou% [11 Phosphopantetheine is attached to a serine residue in these proteins [2]. ACP proteins or domainshave 
been found in various enzyme systems which are listed below (references are only provided ^"riTT^^^^^ 
sequences). - Fatty acid synthetase (FAS), which catalyzes the formation of long-chain fatty acids from a^etyl-CoA^ 
maionyl-CoA and NADPH. Bacterial and plant chloroplast FAS are composed of eight separate subunits ^^^^^"^ 
spondto the different enzymaticactivities; ACP isoneofthese polypeptides. Fungal FAS consists of twomultif^^^^^ 

proteins FAS1 and FAS2; the ACP domain is located In the N-temninal section of FAS2. Vertebrate FAS consists of a 
single multifunctional enzyme; the ACP domain is located between the beta-ketoacyl redud:ase domain and the C- 
terminal thioesterase domain p]. - Polyketide antibiotics synthase enzyme systems. Polyket.des are secondary me- 
tabolites produced from simple fatty acids, by microorganisms and plants. ACP is one of the polypept.dic componer^ts 
Involved in the biosynthesis of Streptomyces polyketide antibiotics actinorhodin. curamycin. granatac^. n^o^ensm 
oxytetracycline and tetracenomycin C. - Bacillus subtilis putative polyketide synthases pksK. pksL a^^pksM which 
respectively contain three, five and one ACP domains. - The multifunctional 6-methysalicyl.c acid synthase (MSAS) 
from Penlcilllum patulum. This is a multifunctional enzyme Involved in the biosynthesis of a polyke ide antibiotic and 
which contains an ACP domain in the C-termlnal extremity. - Multifunctional mycocerosic acid synthase (gene mas) 
from Mycobacterium bovis. - Gramicidin S synthetase I (gene grsA) from Bacillus brevls. This enzyme catalyzes he 
first step In the biosynthesis of the cyclic antibiotic gramicidin S. - Tyrocidine synthetase 1 (gene tycA) fro"^ Ba^''^^ 
brevls. The reaction carried out by tycA is identical to that catalyzed by grsA - Gramicidin S synthetase 1 (gene g^B) 
from Bacillus brevis. This enzyme is a multifunctional protein that activates and polymenzes proline, valine, ornithine 
and leucine. GrsB contains four ACP domains. - Erythronolide synthase proteins 1. 2 and 3 from Saccharopohrspora 
erythraea which Is Involved in the biosynthesis of the polyketide antibiotic erythromicin. Each of these proteins contain 
tv!« ACP domains. - Conidlal green pigment synthase from Aspergillus nidulans. - ACV synthetase from vanous fungi. 
This enzyme catalyzes the first step in the biosynthesis of penicillin and cephalosporin. It contains three ACP domains. 
- Enterobactin synthetase component F (gene entF) from Escherichia coll. This enzyme Is involved in the ATP^epend- 
ent activation of serine during enterobactin (enterochelln) biosynthesis. - Cyclic peptide antibiotic surfactin synthase 
subunits 1 2 and 3 from Bacillus subtilis. Subunits 1 and 2 contains three related domains while subunit 3 only contains 
a single domain. - HC-toxin synthetase (gene HTS1) from Cochltobolus carbonum^This enzyme ^Vnthe^zes HC- ox.^ 
a cyclic tetrapeptide. HTS1 contains four ACP domains. - Fungal mitochondnal ACP [9]. which is part of the respiratory 
chain NADH dehydrogenase (complex 1). - Rhizobium nodulation protein nodF, which probably acts as an ACP in the 
synthesis of the nodulation Nod factor fatty acyl chaln.The sequence around the phosphopartetheine ate<*ment srte 
is conserved in all these proteins and can be used as a signature pattern. A profile was also developed that spans the 

Sr<?onSnst^Cern:[DEQGSTALMKRH]-[LIVMFYSTA^^^ 

l^pTA?CTOuJ^n-[LIVMATN]-(DENQGTAKRHLM^ [UVMWSTA]-ILIVGSTACRl-x(2)-[LIVMFA] [S is the panteth- 
eine attachment site] 

[ 1] Concise Encyclopedia Biochemistry, Second Edition. Walter de Gruyter. Beriin New-Yoric (1988). 
f 2] Pugh E.L., Wakil S.J. J. Biol. Chem. 240:4727-4733(1965). „r./ioQix 
3] WitkowskI A., Rangan V.S., Randhawa Z.I., Amy CM., Smith S. Eur. J. Biochertv 198:571-579(199^ 
[6] Scotti C, Piatti M.. Cuzzoni A.. Perani P.. Tognoni A., Grandi G., GallzzI A., Albertini A.M. Gene 130.65-71 
(1993) 

[9] Sackmann U.. Zensen R.. Rohlen D.. Jahnke U.. Weiss H. Eur. J. Biochem. 200:463-469(1991). 
[1260] 464. (Prenyltrans) Terpene synthases Signature 

The following enzymes catalyze mechanistically related reactions which involvethe highly complex cyclic rearrange- 
ment of squalene or Its 2,3 oxide: - Lanosterol synthase (EC 5.4.99.7) (oxidosqualene-lanosterol cydase) jhich 
catalyzes the cyclization of (S)-2.3-epoxysqualene to lanosterol. the initial precursor of ^^^^'f^^^jfT^ 
and vitamin D in vertebrates and of ergosterol in fungi (gene ERG7). - Cycloartenol synthase (EC 5AMJ) (2.3-epox- 
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ysqualene— cycloartenol cyclase), a plant enzyme that catalyzes the cyclization of (S)-2,3- epoxysqualene to cycloar- 
tenol. - Hopene synthase (EC 5.4.99.-) (squalene— hopene cyclase), a bacterial enzyme that catalyzes the cyclization 
of squalene into hopene, a key step in hopanoid (triterpenold) metabolism.These enzymes are evolutionary related [1] 
proteins of about 70 to 85 Kd. As a signature pattern, a highly conserved region was selected which is rich in aromatic 
5 residues and which is located in the C-terminal section. 

[1261] Consensus pattern: [DE)-G-S.W-x-G-x-W-[GA]-(LIVM]-x-(FY]-x-Y-(GA] 

[1262] [ 1] Corey E.J., Matsuda S.P.T.. Bartel B. Proc. Natl. Acad. Sci. U.S.A. 90:11628-11632(1993). 
[1263] 465. Prion protein signatures 

Prion protein (PrP) [1 ,2,3] is a small glycoprotein found in high quantity in the brains of humans or animals infected 
10 with a number of degenerative neurological diseases such as Kuru. Creutzfeldt^acob disease (CJD). scrapie or bovine 
spongiform encephalopathy (BSE). PrP is encoded in the host genome and expressed both in normal and infected 
cells. It has a tendency to aggregate yielding polymers called rods. Structurally, PrP is a protein consisting of a signal 
peptide, followed by an N-terminal domain that contains tandem repeats of a short motif (PHGGGWGQin mammals, 
PHNPGY in chicken), itself followed by a highly conserved domain lly comes a C-terminal hydrophobic domain post- 
15 translationally removed when PrP is attachedto the extracellular side of the cell membrane by a GPI-anchor. The 

structureof PrP is shown in the following schematic representation: + — + + 

+|Sig| Tandem repeats | C C S|| +— + + 1 1 — 1+ + + + | GPrC: conserved 

cysteine involved in a disulfide bond.'**: position of the patterns. As signature pattern for PrP, a perfectly conserved 
alanine- and glycine-rich region of 16 residues was selected as well as a region centered on the second cysteine 
20 involved in the disulfide bond. 

[1 264] Consensus pattem: A-G-A-A-A-A-G-A-V-V-G-G-L-G-G-Y- 

Consensus pattem: E-x-[ED]-x-K-[LIVM](2)-x-[KR]-[LIVM](2)-x-[QE]-M-C-x(2)- Q-Y [C is involved in a disulfide bond] 

[ 1] Stahl N.. Prusiner S.B. FASEB J. 5:2799-2807(1991). 
25 1 2] Brunori M., Chiara Silvestrini M., Pocchtari M. Trends Biochem. Sci. 13:309-313(1988). 

[ 3] Prusiner S.B. Annu. Rev. Microbiol. 43:345-374(1989). 

[1265] 466. Cyclophilin-type peptidyl-prolyl cis-trans isomerase signature and profile (pro isomerase) 

Cyclophilin [1] is the major high-affinity binding protein in vertebrates for the immunosuppressive drug cyclosporin A 

30 (CSA). It exhibits a peptidyl- prolyl cis-trans isomerase activity (EC 5.2.1.8 ) (PPIase or rotamase). PPIase is an enzyme 
that accelerates protein folding by catalyzing the cis-transisomerization of proline imidic peptide bonds in oligopeptides 
[2]. It is probable that CSA mediates some of its effects via an inhibitory action on PPIase. Cyclophilin is a cytosolic 
protein which belongs to a family [3,4,51that also includes the following isozymes: - Cyclophilin B (or S-cyclophtlin), a 
PPIase which is retained in an endoplasmic reticulum compartment. - Cyclophilin C, a cytoplasmic PPIase. - Mitochon- 

35 drial matrix cyclophilin (cyp3). - A PPIase which seems specific for the folding of rhodopsin and is an integral membrane 
protein anchored by a C-terminal transmembrane region. This protein was first characterized in Drosophila (gene 
ninaA). - Bacterial periplasmic PPiase (gene ppiA). - Bacterial cytosolic PPiase (gene ppiB). - Natural-killer cell cyclo- 
philin-related protein. This large protein (about 160 Kd) is a component of a putative tumor-recognition complex involved 
in the function of NK cells. It contains a cyclophilin-type PPiase domain. - Mammalian nucleoporin Nup358 [6], a nuclear 

<0 pore complex protein of 358 Kd that contains a C-terminal cyclophilin-type PPiase domain. - Yeast hypothetical protein 
YJR032W. - Fission yeast hypothetical protein SpAC21 El 1.05c. - Caenorhabditis elegans hypothetical protein 
T27D1.1.The sequences of the different forms of cyclophilin-type PPIases are well conserved. As a signature pattem, 
a conserved region was selected in the central part of these enzymes. 

[1266] Consensus pattern: [FY]-x(2)-[STCNLV]-x-F-H-[RH]-[LIVMN]-[LIVM]-x(2)-F- [LIVM]-x-Q-[AG]-G- FKBP's. a 
45 family of proteins that bind the immunosuppressive drug FK506, are also PPIases. but their sequence is not at all 
related to that of cyclophilin. 

[ 1] Stamnes M.A.. Rutherford S.L.. Zuker C.S. Trends Cell Biol. 2:272-276(1992). 
[ 2] Fischer G., Schmid F.X Biochemistry 29:2205-2212(1990). 
50 1 3] Trandinh C.C.. Pao G.M.. Saier M.H. Jr. FASEB J. 6:3410-3420(1992). 

[4] Galat A. Eur. J. Biochem. 216:689-707(1993). 
[5] Hacker J., Fischer G. Mol. Microbiol. 10:445456(1993), 

[ 6] Wu J., Matunis M.J.. Kraemer D., Blobel G.. Coutavas E. J. Biol. Chem. 270:14209-14213(1995) . 

55 [1267] 467. Profilin signature 

Profilin [1.2] is a small eukaryotic protein that binds to monomeric actin(G-actin) in a 1:1 ratio thus preventing the 
polymerization of actin into filaments (F-actin). It can also, in certain circumstance promotes actin polymerization. 
Profilin also binds to polyphosphoinositides such as PIP2.0verall sequence similarity among profilin from organisms 
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which belong to different phyla (ranging from fungi to mammals) is low, but the N-tem.inal region relatively well 

„„ . ,».ion ic .h'nnhtto h« involved inthe binding to actin. The signature pattem for profilin is based on 

^ns^rCed resrdues'ai thV N-teri^inal extremity .A protein structurally similar to profilin is present in the genome ot 

variola and vaccinia viruses (gene A42R). 

[12681 Consensus pattern: <x(0,1HSTA]-x(0,1)-W-[DENQH]-x-[Y!]-x-[DEQ] 

[ 1] Haarer B.K.. Brown S.S. Cell Motil. Cytoskeleton 17:71-74(1990). 
[ 2] Sohn R.H.. Goldschmidt-Clermont P. BioEssays 16:465-472(1994). 

ri2691 468. Protamine PI signature . ^ . ... l. • ^ 

Protamines are small, highly basic proteins, that substitute for histones in sperm chromatin during tf^^hapbid phase 
of spemiatogenesis. They pack spem, DNA into a highly condensed, stable and inactive comp ex. T^^ere are hvo 
different types of mammalian protamine, called P1 and P2. P1 has been found in all species « "died white P2 is 
sometimes absent. There seems to be a single type of avian protamine whose '^f 
mammalian P1 [1].As a signature for thisfamily of proteins, a conserved region was selected at the N4emiinal extremity 
of the sequence. 

[12701 Consensus pattem: [AV]-R-[NFY]-R-x(2.3)-[STl-x-S-x-S- 

11271] [ 1] Oliva R.. Goren R., Dixon G.H. J. Biol. Chem. 264:17627-17630(1989). 

ri2721 469. Sperm histone P2 (protamine P2) 

This protein also known as protamine P2 can substitute for histones in the chromatin of spemi. The alignment contains 
both the sequence of the mature P2 protein and its propeptide. 

[12731 470. Proteasome A-type subunits signature „..,..■ 
Vhe p oteasome (or macropain) (EC 3.4.99.46 ) [1 to 5.E1] is an eukaryotic and archaebactenal rrjulfcatalytic Proteinase 
complex that seemsto be involved inli^TxpTubiquitin-dependent nonlysosomal proteolytic pathway. In eukaryotes he 
^teaime Is composed of about 28 distinct subunits which torn, a highly ordered ring^haped structure (20S nng) of 
about 700 Kd Most proteasome subunits can be classified, on the basis on sequence similarities into two groups, A 
and B. Subuntts that belong to the A-type group are proteins of from 210 to 290 amino acids """j.^^^j^.^ 
conserved sequence regions. Subunits that are known to belong to this family are I'Sted below - Vertebrate subun ts 
C2 (nu). C3. C8, C9. iote and zeta. - Drosophila PROS-25. PROS-28.1. PROS-29 and PROS-35 - Yeast C1 (PRS1), 
C5 (PRS3) C7-alDha (Y8) (PRS2), Y7. Y1 3, PRES. PRE6 and PUP2. - Arabidopsis thaliana subunits alpha and PSM30 
-Themioplasma acidophilum alpha-subunit. In this archaebacleria the proteasome is composed 
subunits.As a signature pattem for proteasome A-type subunits the best conserved region was selected, which is 
located in the N-terminal part of these proteins. ^ r, ri-.ci v r<5Ani t«!ARi- 

[1274] Consensus pattem: [FYl-x(4HSTNV]-x-[FYW]-S-P-x-G-[RKH]-x(2)-Q-lLIVMl-[DE]- Y-[SAD]-x(2)-[SAG] . 
These proteins betong to family T1 in the classification of peptidases [6.E2]. 

[ 1] Rivett A.J. Biochem. J. 291:1-10(1993). 

[2] Rivett A.J. Arch. Biochem. Biophys. 268:1-8(1989). 

[ 3] Goldberg A.L., Rock K.L Nature 357:375-379(1992). 

[4] Wilk S. Enzyme Protein 47:187-188(1993). 

[ 5] Hilt W.. Wolf D.H. Trends Biochem. Sci. 21:96-102(1996). 

[ 6] Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:19-61(1994). 

[12751 Proteasome B-type subunits Signature 

Vhe p oteasome (ormacropain) fEC 3.4.99.46 ) [1 toS.EI] is an eukaryotic and archaebacterial mu ticatelytic P^otema^ 
complex that seems to be involved in an ATP/ubiquitin-dependent nonlysosomal proteolytic P^t^-^Vj"^^^^^^^ 
the proteasome is composed of about 28 distinct subunits which torw a highly ordered nng^haped structure (20S ring) 
of about 700 Kd. Most proteasome subunits can be classified, on the basis on sequence similanties into two groups 
A and B. Subunits that belong to the B-type group are proteins of from 190 to 290 amino acids '^l^^^^Z^^ 
consented sequence regions. Subunits that are known to belong to th f tem.ly are "sted below. - Vertebrate subunte 
C5. beta, delta, epslton. thete (C10-II). LMP2/R1NG12. C13 (LMP7/R1NG10), C7-1 and MECL-V- Yeast PREI- PRE2 
fPRGD PRE3 PRE4. PRS3. PUP1 and PUP3. - Drosophila L(3)73AI. - Fission yeast ptsl. - Thermoplasma ac do- 
philum beta-subunit. In this archaebacteria the proteasome is composed of only two different subunits. As a signature 
pattem for proteasome B-type subunits the best conserved region was selected, which is located in the N^emimal part 

Sf7rc?rrsensus pattern: [LlVMAHGSA]-[LIVMFl-x-[FYLVGAC]-x(2HGSACFJp-ILIVMST^^^^^ 
IGSTACVl-lDESl-x(15)-IRK]-x(12.13)-G-x(2)-[GSTAl-D-. These proteins belong to family T1 in the classification of 
peptidases [6.E21. 
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[ 1] Rivett A.J. Biochem. J. 291:1-10(1993). 
[ 2J Rivett A. J. Arch. Biochem. Biophys. 268:1-8(1989). 
[ 3] Goldberg A.L.. Rock K.L Nature 357:375-379(1992). 
[ 4) Wilk S. Enzyme Protein 47:187-188(1993). 
5 [ 51 Hilt W.. Wolf D.H. Trends Biochem. Sci. 21:96-102(1996). 

[6] Rawlings N.D.. Banrett A.J. Meth. Enzymol. 244:19-61(1994). 

[1277] 471. (pyr redox) Pyridine nucleotide-disulphide oxidoreductases class-! active site 

The pyridine nucleotide-disulphide oxidoreductases are FAD flavoproteins which contains a pair of redox-active 
10 cysteines involved in the transfer of reducing equivalents from the FAD cofactor to the substrate. On the basis of 
sequence and structural similarities [1] these enzymes can t>e classified into two categories. The first category groups 
together the following enzymes [2 to 6]: - Glutathione reductase (EC 1.6.4.2 ) (GR). - Higher eukaryotes thioredoxin 
reductase (EC 1.6.4.5 ). - Trypanothione reductase (EC 1.6.4.8 ). - Lipoamide dehydrogenase (EC 1.8.1.4 ). the E3 
component of alpha-ketoacid dehydrogenase complexes. - Mercuric reductase (EC 1.16.1.1 ).The sequence around 
15 the two cysteines involved in the redox-active disulfide bond is conserved and can be used as a signature pattern. 
[1278] Consensus pattern: G-G-x-C-[LIVA]-x(2)-G-C-[LIVM)-P [The two C's fomi the active site disulfide bond]. In 
positions 6 and 7 of the pattern all known sequences have Asn-(Val/ lie) with the exception of GR from plant chloroplasts 
and from cyanobacteria which have lle-Arg [7]. 

20 [ 1] Kurlyan J.. Krishna T.S.R., Wong L.. Guenther B., Pahler A., Williams C.H. Jr., Model P. Nature 352:172-174 

(1991). 

[ 2] Rice D.W.. Schuiz G.E., Guest J.R. J. Mol. Biol. 174:483-496(1984). 
[ 3) Brown N.L. Trends Biochem. Sci. 10:400-402(1985). 

[ 4] Carothers D.J., Pons G., Patel M.S. Arch. Biochem. Biophys. 268:409-425(1989). 
25 [ 5] Walsh C.T., Bradley M.. Nadeau K. Trends Biochem. Sci. 16:305-309(1991). 

[ 6] Gasdaska P.Y., Gasdaska J.R., Cochran S., Powis G. FEBS Lett. 373:5-9(1995). 

[ 7] Creissen G.. Edwards E.A., Enard C, Wellbum A.. Mullineaux P. Plant J. 2:129-131(1991). 

[1279] 472. (pyridoxal deC) DDC / GAD / HDC / TyrDC pyridoxal-phosphate attachment site (pyridoxal deC) 
30 Three different enzymes - all pyridoxal-dependent decarboxylases - seem to share regions of sequence similarity 
[1,2,3,4], especially in the vicinity of the lysine residue which serves as the attachment site for the pyridoxal-phosphate 
(PLP) group. These enzymes are: - Glutamate decarboxylase (EC 4.1.1.15 ) (GAD). Catalyzes the decarboxylation of 
glutamate into the neurotransmitter GABA (4-aminobutanoate). - Histidine decarboxylase (EC 4. 1.1. 22) (HDC). Cata- 
lyzes the decarboxylation of histidine to histamine. There are two completely unrelated types of HDC: those that use 
35 PLP as a cofactor (found in Gram-negative bacteria and mammals), and those that contain a covalently bound pyruvoyi 
residue (found in Gram-positive bacteria). - Aromatic-L-amino-acid decarboxylase (EC 4.1.1.28 ) (DDC), also known 
as L-dopa decarboxylase or tryptophan decarboxylase. DDC catalyzes the decarboxylation of tryptophan to tryptamine. 
It also acts on 5-hydroxy-tryptophan and dihydroxyphenylalanine (L-dopa). - Tyrosine decarboxylase (EC 4.1.1.25 ) 
(TyrDC) which converts tyrosine into tyramine, a precursor of isoquinoline alkaloids and various amides.These enzymes 
^ are collectively known as group II decarboxylases [3,4]. 

[1 280] Consensus pattem: S-[LIVMFYWl-x(5)-K-[LIVMFYWG](2)-x(3)-[LI VMFYW]-x-[CA]-x(2)-[LI VMFYWQ]-x(2)- 
[RK] [K is the pyridoxal-P attachment site] 

[ 1] Jackson F.R. J. Mol. Evol. 31:325-329(1990). 
45 [ 2] Joseph D.R., Sullivan P., Wang Y.-M,, Kozak C, Fenstermacher D.A., Behrendsen M.E., Zahnow C.A. Proc. 

Natl. Acad. Sci. U.S.A. 87:733-737(1990). 

[ 3] Sandmeier E., Hale T.I., Christen P. Eur. J. Biochem. 221:997-1002(1994). 

[4] Ishii S., Mizugichi H., Nishino J., Hayashi H., Kagamiyama H. J. Biochem. 120:369-376(1996). 

50 [1281] 473. Regulator of chromosome condensation (RCC1 ) signatures (RCC1 ) 

The regulator of chromosome condensation (RCC1) [1] is a eukaryotic protein which binds to chromatin and interacts 
with ran, a nuclear GTP-binding protein, to promote the loss of bound GDP and the uptake offresh GTP, thus acting 
as a guanine-nucleotide dissociation stimulator (GDS)[2]. The interaction of RCC1 with ran probably plays an important 
role in the regulation of gene expression. RCC1, known as PRP20 or SRM1 in yeast, pimi in fission yeast and BJ1 in 

55 Drosophila. is a protein that contains seven tandem repeats of a domain of about 50 to 60 amino acids. As shown in 
the following schematic representation, the repeats make up the major part of the length of the protein. Outside the 
repeat region, there is just a small N-terminal domain of about 40 to 50 residues and, in the Drosophila protein only, a 
C-terminal domain of about 130 residues. + — + + + + + + + + + |N-t.|Rpt. 
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1 IRpt 2 IRpt. 3 IRpt. 4 IRpt. 5 |Rpt. 6 |Rpt. 7 | C-terminal | + — ^ 
^ _ \. i„ nr«o«nhii= cinnat.ir« nattems for RCC1 were developed. The first is found in the N-tenninal part 
of the second repeat; this is the nTost conserved part of RCC1 . The second is derived from conserved positions in the 
C-terminal part of each repeat and detects up to five copies of the repeated domain. The RCC1-type of repeat is also 
found in the X-linked retinitis pigmentosa GTPase regulator \3]. 
[1282] Consensus partem: G-x-N-D-x(2)-[A\/]-L-G-R-x-T- 
Consensus pattern: [LIVMFA]-[STAGCl(2)-G-x(2)-H-[STAGLI]-[LlVMFA]-x-ILIVM]- 

1 1] Dasso M. Trends Biochem. Sci. 18:96-101(1993). 
r 21 Boguski M.S., McCormick F. Nature 366:643-654(1993). 
3 Roepman R.. Van Duijnhoven G.. Rosenberg T.. Pinckers A.J.L.G.. Bleeker-Wagemakers Bergen A A. 
B.. Post J., Beck A.. Reinhardt R.. Ropers H.-H.. Cremeis F.. Berger W. Hum. Mol. Genet. 5:1035-1041(1996). 

[12831 474. RNA 3'-terminal phosphate cyclase signature (RCT) 

RNA 3Mermlnal phosphate cyclase (EC 6.5.1.4 ) [1,2] catalyzes the conversion of 3'-phosphate to a 2 .3 -cyclic phos- 
phodiester at the end of RNA. The biological role of this enzyme is unknown but it is likely to function in some aspects 
of cellular RNA processing. The reaction catalyzed by the enzyme occurs in three steps: 1 ) adenylation of the enzyme 
by ATP- 2) the enzyme acts on RNA-3terminal phosphate to produce RNA-3'terminal diphosphate adenylate; 3) Re- 
lease of AMP and cyclisation by a non catalytic nucleophillc attack by the adjacent 2-hydroxyl on the phosphorus in 
the diester linkage. This enzyme, which has been characterized in human (where there seems to be at east three 
isozymes) and Escherichia coli (gene rtCA), seems to be taxonomically widespread. It is found in insects, plants, fungi 
(qene RTC1 inyeast) and in archeabacteria. RNA cyclase is a protein of from 36 to 42 Kd. The best conseived region, 
which is used as a signature pattern, is a glycine-rich stretch of residues located in the central part of *e sequence 
and which is reminiscent of various ATP, GTPor AMP glycine-rich loops. In this context, the conserved Arg (His in the 
E.coli enzyme) could be the AMP-binding residue. 
[1284] Consensus pattern: [RH]-G-x(2)-P-x-G(3)-x-[LIV]- 

[ 1] Genschik P., Billy E., Swianlewicz M.. Fllipowicz W. EMBO J. 16:2955-2967(1997). 
[ 2] Fllipowicz W.. Vmcente O. Meth. Enzymol. 181:499-510(1990). 

[1285] 475. REV protein (anti-repression trans-activator protein) 

[1286] 476 Prokaryotic-type class I peptide chain release factors signature (RF-1) 

Peptide chain release factors (RFs) are required for the termination of protein biosynthesis [IJ. At present two classes 
of RFs can be distinguished. Class I RFs bind to ribosomes that have encountered a stop codon at their decoding site 
and induce release of the nascent polypeptide. Class II RFs are GTP-binding proteins that interact with class I RFs 
and enhance class I RF activity. In prokaryotes there are two class I RFs that act in a codon specific niannert2]: RF-1 
(gene prfA) mediates UAA and UAG-dependent termination while RF-2(gene prfB) mediates UAA and UGA-dependent 
temiination RF-1 and RF-2 are structurally and evoluttonary related proteins which have been shown [3] to make up 
a family that also contains the following proteins: - Fungal MRF1, a mitochondrial RF (m-RF) which recognizes the 
UAA and UAG codons. - Escherichia coli RF-H, a protein of unknown function. - Escherichia coli hypothetical protein 
yaeJ and a close Pseudomonas putida homolog. A highly conserved region located in the central part of the 40 to 45 
Kd RF-1/2 and m-RF and in the N-terminal of the 15 to 16Kd RF-H and yaeJ is used as a signature pattern. 
[1287] Consensus pattern: [AR]-[STA]-x-G-x-G-G-Q-[HNGCS]-V-N-x(3)-[ST]-A-[IV] ^. , 

Note that prokaryotic-type class I RFs display no significant sequence similarity to prokaryotic-type class II which belong 
to the family of GTP-binding elongation factors nor to eukaryotic class I or class II RFs. 

[ 1] Tate W.P., Poole E.S., Mannering S.M. Prog. Nucleic Acids. Res. Mol. Biol. 52:293-335(1996). 
[ 2] Craigen W.J., Lee C.C.. Caskey C.T. Mol. Microbiol. 4:861-865(1990). 
[ 3] Pel H.J.. Rep M.. Grivell L.A. Nucleic Acids Res. 20:4423-4428(1992). 

[1288] 477. RIO1/ZK632.3/MJ0444 family signature 

The fbllowing uncharacterized proteins are evolutionary related [1]: - Yeast protein R101. - Caenortiabditis elegans 
hypothetical protein ZK632.3. - Methanococcus jannaschii hypothetical protein MJ0444. - Thermoplasma acidoph.lum 
hypothetical protein if rpoA2 3'region.The eukaryotic members of this family are proteins of about55 to 60 Kd. while 
the archebacterial ones are half that size. The central part of these proteins is highly conserved. The best conserved 
region is used as a signature pattern. 

[1289] Consensus pattern: [LIVM}-V-H-IGAl-D-L-S-E-[FY]-N-x-[LIVM] 
[1290] [ 1] Bairoch A. Unpublished observations (1997). 
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[1291] 478. (RiP) Shiga/ricin ribosomal inactivating toxins active site signature. A nunnber of bacterial and plant toxins 
act by inhibiting protein synthesis in eukaryotic cells. The toxins of the Shiga and hcin family inactivate 60S ribosomal 
subunits by an N-glycosidic cleavage which releases a specific adenine base firom the sugar-phosphate backbone of 
28S rRNA [1,2,3]. The toxins which are known to function in this manner are: - Shiga toxin from Shigella dysenteriae 

5 [4]. This toxin is composed of one copy of an enzymaticalty active A subunit and five copies of a B subunit responsible 
for binding the toxin complex to specific receptors on the target cell surface. - Shiga-like toxins (SLT) are a group of 
Escherichia coli toxins very similar in their structure and properties to Shiga toxin. The sequence of two types of these 
toxins, SLT-1 [5] and SLT-2 [6], is known. - Ricin, a potent toxin from castor bean seeds. Ricin consists of two glyco- 
sylated chains linked by a disulfide bond. The A chain is enzymatically active. The B chain is a lectin with a binding 

10 preference for galactosides. Both chains are encoded by a single polypeptidic precursor. Ricin is classified as a type- 
II ribosome-inactivating protein (RIP); other members of this family are agglutinin, also from castor bean, and abrin 
from the seeds of the bean Abrus precatorius [7], - Single chain ribosome-inactivating proteins (type-l RIP) from plants. 
Examples of such proteins are: bariey protein synthesis inhibitors I and II, mongolian snake-gourd trichosanthin, sponge 
gourd lufTin-A and -B, garden four-o'clock MAP, common pokeberry PAP-S and soapwort saporin-6 [7].An these toxins 

15 are structurally related. A conserved glutamic residue has been implicated [8] in the catalytic mechanism; it is located 
near a conserved arginine which also plays a role in catalysis [9]. The signature that has been developed for these 
proteins includes these catalytic residues. 

[1292] Consensus pattern: [LIVMA]-x-[LIVMSTA](2)-x-E-[SAGV]-[STAL]-R-[FY]-[RKNQS]-x- [LIVM]-[EQS]-x{2)- 

[LIVMF] [E and R are active site residues]- 
20 [1293] [ 1] Endo Y.. Tsurugi K., Takeda Y., Ogasawara T, Igarashi K. Eur. J. Biochem. 171 :45-50(1988).[ 2] May M. 

J., Hartley M.R., Roberts L.M., Krieg PA., Osborn R.W., Lord J.M. EMBO J. 8:301-308(1 989).[ 3] Funatsu G., Islam 

M.R., Minami Y, Sung-Sil K., Kimura M. Biochimie 73:1157-1161(1991).[ 4] Strockbine N.A., Jackson M.P., Sung L. 

M., Holmes R.K., O'Brien A.D. J. Bacterid. 170:1116-1122(1988).[ 5] Calderwood S.B., Auclair R, Donohue-Rotfe A., 

Keusch G.T., Mekalanos J.J. Proc. Natl. Acad. Sci. U.S.A. 84:4364-4368(1 987). [ 6] Jackson M.P., Neill R.J., O'Brien 
25 A.D., Holmes R.K., Newland J.W. FEMS Microbiol. Lett. 44:1 09-1 14(1 987).[ 7] Barbieri L.. Battelli M.G., Stirpe F. Bio- 

chim. Biophys. Acta 11 54:237-282(1 993).[ 8] Hovde C.J., Calderwood S.B.. Mekalanos J. J., Collier R.J. Proc. Natl. 

Acad. Sci. U.S.A. 85:2568-2572(1 988).[ 9] Monzingo A.F., Collins E.J., Emst S.R., Irvin J.D.. Robertus J.D. J. Mol. 

Biol. 233:705-715(1993). 

[1294] 479. Bacterial RNA polymerase, alpha chain (RNA pol A bac) 

30 Members of this family include alpha subunit from eubacteria and alpha subunits from chloroplasts. The alpha subunit 
of RNA polymerase consists of two independently folded domains, referred to as ami no-terminal and carboxyl terminal 
domains. The amino terminal domain is involved in the interaction with the other subunits of the RNA polymerase. The 
carboxyl-terminal domain interacts with the DNA and activators. The amino acid sequence of the alpha subunit is 
conserved in prokaryotic and chloroplast RNA polymerases. There are three regions of particularly strong conservation. 

35 two in the amino-terminal and one in the carboxyl-Comment: terminal [3]. 

[1] Zhang G, Darst SA; Science 1998;281:262-266. [2] Jeon YH, NegishiT, Shirakawa M, YamazakiT, Fujita N, Ishihama 
A, Kyogoku Y; Science 1995;270:1495-1497. [3] Ebright RH, Busby S; Curr Opin Genet Dev 1995;5:197-203, [4] Mu- 
rakami K, Kimura M, Owens JT, Meares CF, Ishihama A; Proc Natl Acad Sci USA 1997;94:1709-1714. 
[1295] 480. RNA polymerase beta subunit (RNA pol B) 

40 RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase 
compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). Each RNA polymerase 
complex contains two related members of this family, in each case they are the two largest subunits. [1] Falkenburg 
D. Dworniczak B. Faust DM. Bautz EK; J Mol Biol 1987;195:929-937. 
[1296] 481. RNA polymerases H / 23 Kd subunits signature 

45 In eukaryotes, there are three different forms of DNA-dependent RNA polymerases (EC 2.7.7.6 ) transcribing different 
sets of genes. Each class of RNA polymerase is an assemblage often to twelve different polypeptides. In archaebac- 
teria, there is generally a single form of RNA polymerase which also consist of an oligomeric assemblage of 10 to 13 
polypeptides. Archaebacterial subunit H (gene rpoH) [1,2] is a small protein of about 8.5 to10 Kd. it is evolutionary 
related to the C-terminal part of a 23 Kd component shared by all three forms of eukaryotic RNA polymerases (gene 

50 RPB5 in yeast and POLR2E in mammals) As a signature pattem a conserved region was selected which is located at 
theN-terminal extremity of subunit H; this region contains two histidines that could play a role in the binding of a metal ion. 
[1 297] Consensus pattem: H-[NEI]-[LIVM]-V-P-x-H-x(2)-[LIVM]-x(2)-[DEl 

[ 1] Klenk H.-P, Palm P, Lottspeich F.. Zillig W. Proc. Natl. Acad. Sci. U.S.A. 89:407-410(1992). 
55 1 2] Thiru A.. Hodach M., Eloranta J.J., Kostourou V., Weinzieri R.O.. Matthews S.; J. Mol. Biol. 287:753-760(1999). 

[1298] 482. RNA polymerases K / 14 to 18 Kd subunits signature 

In eukaryotes, there are three different forms of DNA-dependent RNApolymerases (EC 2.7.7.6 ) transcribing different 
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sets of genes. Each class of RNA polymerase is an assemblage often to twelve different polypeptides. In archaebac- 
.K^ro ic nonoraiiv » .<;inale form of RNA polymerase which also consist of an ollgomenc assemblage of 10 to 13 
poiypeptidVs'. A c^m'p;,nenl of 14 to 18 Kd shared by all three fomr,s of eukaryotic RNA polymerases and wn.cn nas 
Seensequencedinbuddlng yeast(geneRPB6orRP026)Mn fission yeast(generpb6or^o15)..nhum^^^^^^^ 
swine fever virus [1] is evolutionary related [2] to archaebacterial subunit K (gene rpoK). The archaebactenal protein 
is colinear with the C-terminal part of the eul<aryotic subunit. 
£1299] Consensus pattern: [STl-x-[FY]-E-x-[AT]-R-x-[LIVM]-(GSA]-x-R-ISA]-x-Q 

[ 1] Lu Z.. Kutish G.F., Sussman M.D.. Rock D.L. Nucleic Acids Res. 21:2940-2940(1993). 
[ 21 McKilne K., Woychik N.A. J. Bacteriol. 176:4754-4756(1994). 

[13001 483. RNA polymerases L/ 13 to 16 Kdsubunits signature ^ „^,^„. 

n eukaryotes. there are three different forms of DNA-dependent RNApolymerases (EC giZJi transcnb.ng d.ff^^^^^ 
sets of genes. Each class of RNA polymerase is an assemblage often to twelve different polypeptides. In archaebac- 
teria there is generally a single form of RNA polymerase which also consist of an oligomenc ^^f^^'^S^/J^ *° 
polypeptides. It has been shown that small subunits of about 13to16Kd found in all three typesofe^^^^^ 

as^^e highly conserved. Subunits known to belong to this family are: - Budding yeast RPC19 subunrt from RNA 
S^merasS aS Ml [1]. - Budding yeast RPB11 subunit from RNA polymerase II [2]. - Mammalian RPB11 (gene 
Kr^2Sfr^RNApoUer3sell.cLno,1,abditiselegans hypothetic 

RNA polymerase subunit L (gene rpoL). - Sulfolobus addocaldarius RNA polymerase subunit L (9«"«J°^H3] As a 
signature pattern a conserved region was selected which is located at the N-temiinal extremrty of these polymerase 
subunits- this region contains two cysteines that could play a role in the binding of a metal ion. 
[1301] Consensus pattern: [DE](2)-H-[STl-[LIVMl-[GAP]-N-x(11)-V-x-[FM]-x(2)-Y-x(3)- H-P 

[ 1] Dequard-Chablat M.. Riva M.. Carles C, Sentenac A. J. Biol. Chem. 266:15300-15307(1991). 
[ 2] Woychik NA., McKune K.. Lane W.S., Young RA. Gene Expr. 3:77-82(1993). 
[ 3] Langer D. EMBUGenBank: X70805. 

ri3021 484 RNA polymerases N / 8 Kd subunits signature 

K eukaryoSs. there are three different forms of DNA-dependent RNA polymerases (EC gxp) '--cnbing^fferen^ 
sets of genes. Each class of RNA polymerase is an assemblage of ten to twelve different polypeptides. In archaebac- 
teria there is generally a single form of RNA polymerase which also consist of an oligomenc assemblage of 0 to 3 
poly^ptides. Archaebacterial subunrt N (gene rpoN) [1] is a small protein of about 8 « -s evolutionaryj^^^ed^^^^ 
to a 8.3 Kd component shared by all three fomis of eukaryotic RNA polymerases (gene RPB1 0 ,n yeast andPCM-^J 
in mammals) as well as to African swine fever vims protein CP80R [3] As a signature pattern a ^o^^'^^^^^^" 
selected which is located at the N-terminal extremity of these polymerase subunits; this region contains two cysteines 
that could play a role In the binding of a metal ion. 
[1303] Consensus pattern: [LIVMF](2)-P-[LIVM]-x-C-F-[ST]-C-G- 

[ 1] Langer D., Hain J.. Thuriaux P, Zillig W. Proc. Natl. Acad. Sci. U.S.A. 92:5768-5772(1995). 

r 21 McKune K Woychik N.A. J. Bacteriol. 176:4754-4756(1994). 

(1995). 

[1304] 485. Ribonuclease Hll 

[1] Mian IS; Nucleic Acids Res 1997;25:3187-3189. 

n 3051 486 Ribonuclease PH signature 

Prokaryotic ribonuclease PH (EC 2.7.7.56 ) (RNase PH) [1] is a phosphorolyticexoribonuclease that removes nucleotide 
residues following the -CCA termil^Iii^RNA and adds nucleotides to the ends of RNA molecules by "^'ng nucle«sKle 
diphosphates as substrates. RNase PH is a conserved protein of about 240 am.no^cid residues. I is evolutionary 
Sd to Caenorhabditis elegans hypothetical protein B0564.1.As a signature pattern, the most h«hly consen/ed 
region was selected which is located in the central part of these proteins. „ „ . ^ « c i oi^ nh^m 

Consensus sequence: C-IDE]-[LIVM](2)-Q-[GTA]-D-G-ISG]-x(2)-n-AJ-A ( 1] Kelly K.O.. Deutscher M.R J. Biol. Chem. 

267:17153-17158(1992). 

[13061 487. RanBPI domain „ „, ■ n ,„,.rfK 

in Di Matteo G. Fuschi P, Zerfass K. Moretti S. Ricordy R, Cenciarelli C. Tripodi M. Jansen-Durr P. Lavia P; Cell Growth 

Differ 1995;6:1213-1224. 

[1307] 488. Rhodanese signatures 
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Rhodanese (thiosulfate sulfurtransf erase) (EC 2.8.1.1 ) [1 ,2] is an enzyme which catalyzes the transfer of the sulfane 
atom of thiosulfate to cyanide, to form sulfite and thiocyanate. In vertebrates, rhodanese is a mitochondrial enzyme of 
about 300 amino-acid residues involved in forming iron-sulfur complexes and cyanide detoxification. A cysteine residue 
takes part in the catalytic mechanism. Some bacterial proteins closely related to rhodanese are also thought to express 

5 a sulfotransferase activity. These are: - Azotobacter vinetandii rhdA. • Escherichia coli sseA [3]. - Saccharopolyspora 
erythraea cysA [4]. - Synechococcus strain PCC 7942 rhdA [5]. RhdA is a periplasmic protein probably involved in the 
transport of sulfur compounds. Two patterns for the rhodanese family were developed. They are based on highly 
conserved regions, one which is located in the N-terminat region, the other at the Otemiinal extremity of the enzyme. 
[1308] Consensus pattern: [FY].x(3)-H-[LIV]-P-G-A-x(2)-[LIVF] 

10 Consensus pattem: [FY]-[DEAF]-G-[SA]-W-x-E-[FYW] 

[ 1] Westley J. Meth. Enzymol. 77:285-291(1981). 
[ 2] Weiland K.L,. Dooley TP. Biochem. J. 275:227-231(1991). 
[ 3] Rudd K.E. Unpublished observations (1993). 
15 [4] Donadio S.. Shafiee A., Hutchinson C.R. J. Bacteriol. 172:350-360(1990). 

[ 5] Laudenbach D.E., Ehrhardt D., Green L.. Grossman A.R. J. Bacteriol. 173:2751-2760(1991). 

[1309] 489. Ribonuclease III family signature 

Prokaryotic ribonuclease III (EC 3.1.26.3 ) (gene rnc) [1] is an enzyme that digests double-stranded RNA. It is involved 
20 in the processing of ribosomal RNA precursors and of some mRNAs. RNase III is evolutionary related [2] to the following 
proteins: - Fission yeast pact, a ribonuclease that probably inhibits mating and meiosis by degrading a specific mRNA 
required for sexual development. - Yeast ribonuclease III (gene RNT1), a dsRNA-specific nuclease that cleaves eu- 
karyotic preribosomai RNA at various sites. - Caenorhabditis elegans hypothetical protein F26E4.13. - Paramecium 
bursaria chlorella virus 1 protein A464R. - Synechocystis strain PCC 6803 hypothetical protein slr0346. - Fission yeast 
25 hypothetical protein SpAC8A4.08c, a protein with a N-termtnal helicase domain and a C-terminal RNase III domain. - 
Caenorhabditis elegans hypothetical protein K12H4.8. a protein with the same structure as SpAC8A4.08c.These pro- 
teins share regions of sequence similarity: one of which is a highly conserved stretch of 9 residues which has been 
developed as a signature pattem. 

[1310] Consensus pattem: [DEQ]-(RQ]-[LM]-E-[FYW]-[LVJ-G-D-[SAR]- 

30 

1 1] Nashimoto H., Uchida H. Mol. Gen. Genet. 201:25-29(1985), 
[2] Mian I.S. Nucleic Acids Res. 25:3187-3195(1997). 

[1311] 490. Rieske iron-sulfur protein signatures 

35 Ubiquinol-cytochrome c reductase (EC 1.10.2.2) (also known as the bcl complexor complex III) is one of the electron 
transport chains of mitochondria and of some aerobic prokaryotes; it catalyzes the oxidoreduction of ubiquinol and 
cytochrome c. In the chloroplast of plants and in cyanobacteria plastoquinone-plastocyanin reductase (EC 1.10.99.1 ) 
(also known as the b6f complex) is functionally similar and catalyzes the oxidoreduction of plastoquinol and cytochrome 
f. One of the components of these electron transfer systems is an iron-sulfur protein with a 2Fe-2S cluster, which is 

^ called the Rieske protein [1,2]. The Rieske protein contains approximately 190 amino acid residues. The iron-sulfur 
cluster is complexed to the protein through cysteine and histidine residues. Two perfectly conserved regions in Rieske 
proteins contains all the residuesthat bind the iron-sulfur cluster. Both regions contain two cysteines and a histidine. 
The first cysteine and the histidine are 2Fe-2S ligands while the remaining cysteines form a disulfide bond [3]. Two 
conserved regions were selected as signature pattems. 

45 [1312] Consensus pattern: C-[TK1-H-L-G-C-[LIVST] [The first C and the H are 2Fe-2S ligands] [The second C is 
involved in a disulfide bond] 

Consensus pattem: C-P-C-H-x-[GSA] [The first C and the H are 2Fe-2S ligands] [The second C is involved in a disulfide 
bond] 

50 1 1] Gatti F.L.. Meinhardt S.W.. Ohnishi T. TzagolofT A. J. Mol. Biol. 205:421-435(1989). 

[ 2] Kallas T. Spiller S.. Malkin R. Proc. Natl. Acad. Sci. U.S.A. 85:5794-5798(1988). 
[ 3] iwata S.. Saynovits M.. Link T.A.. Michel H. Structure 4:567-579(1996). 

[1313] 491. Ribosomal protein LI signature 
55 Ribosomal protein LI is the largest protein from the large ribosomal subunit.ln Escherichia coli, L1 is known to bind to 
the 23S rRNA. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1. 2]. groups: 
- Eubacterial LI. - Algal and plant chloroplast LI. - Cyanelle LI. - Archaebacterial LI. - Vertebrate LI OA. - Yeast 
SSMI.As a signature pattern, the best conserved region was selected located in the central section of these proteins. 
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It is located at the end of an alpha helix thought to be involved in RNA-binding. 

|l««inrj wwi tw^i*! r— 1. J * • » - ■ • - 

[DENSTKQ] 

[11 Nikonov S V Nevskaya N., Eliseikina I.A.. Fomenkova N.P., Nikulin A.. Ossina N., Garber M., Jonsson B.-H., 
Briand C Al-Karadaghi S.. Svensson L.A.. Aevarsson A.. Liljas A. EMBO J. 15:1350-1359(1996). 
[ 2] Olvera J., Wool I.G. 2.3.CO;2-"Biochem. Biophys. Res. Comm un. 220:954-957(1996). 

[13151 492. Ribosomal protein LI 0 signature , hco loc =.„ir.^ 

Ribosomal fMotein L10 is one of the proteins from the large ribosomal subunlt. L10 ,s a protein o 162 to 85 amino- 
acid residues which has only been found so far in eubacteria. A conserved region located .n the N-term.nal section of 
these proteins was used as a signature pattern. 

[13161 Consensus pattern: [DEHl-x(2HGSHUVMn-tSTN]-[VA]-x-[DEQKHLIVMAl-x(2)-[UMl-R 
ri3171 493. Ribosomal protein L10e signature . 
15 A number of eukar/otic and archaebacterial ribosomal proteins can be grouped on the basis of ^^^^f "^^^^''^^^^^^^^^ 
One of these families consists of: - Vertebrate L10 (QM) [1]. - Plant L10. - Caenorhabditis 2°^^' " 

Yeast L10 (QSR1). - Methanococcus jannaschii MJ0543.These proteins have 174 to 232 amino^cid residues. A con- 
served region located in the central section was selected as a signature pattern. 

[1318] Consensus pattern: R-x-A-[FYW]-G-K-[PAl-x-G-x(2)-A-R-V , o n ^ oc^^- 

20 [ 11 Chan Y.-L.. Diaz J. -J.. Denoroy L.. Madjar J.-J.. Wool I.G. 2.3.CO;2-"Biochem. Biophys. Res. Commun. 255. 
952-956(1996). 

[13191 494. Ribosomal protein L1 1 signature ^ . i t a4 

1320 Ribosomal protein L11 is one of the proteins from the large ribosomal subunit. In Eschenchia coli. L11 is 
known to bind directly to the 23S rRNA. It belongs to a family of ribosomal proteins which, on the basis of sequence 
25 similarities [1 .2], groups: 

Eubacterial L11. 

Plant chloroplast L11 (nuclear-encoded). 
Read algal chloroplast L11 . 
30 - CyanelleLII. 

Archaebacterial L11. 
Mammalian LI 2. 

- Plants L1 2. 

- Yeast LI 2 (YL1 5). 

[13211 L11 is a protein of 140 to 165 amino-acid residues. A conserved region located in the C-terminal section of 
these proteins was selected as a signature pattern. In Escherichia coli. the Ctemninal half of L11 has been shown [3] 
to be in an extended and loosely folded conformation and is likely to be buried within the "bof "J^l s^^ 
[1322] Consensus pattern: [RKN]-x-[LIVM]-x-G-[ST]-x(2)-[SNQ]-[LIVM]-G-x(2)-[LIVM]-x{0,1)-[DENG] 



35 
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[ 1] Pucciarelli G.. Remacha M., Ballesta J.PG.; Nucleic Acids Res. 18:4409-4416(1990). 
[ 2] Otaka E., Hashimoto T, Mizuta K., Suzuki K.; Protein Seq. Data Anal. 5:301-313(1993). 
[ 3] Choli T. Biochem. Int 19:1323-1338(1989). 

45 [1323] 495. Ribosomal protein L7/L12 C-terminal domain 

[1324] [1] Leijonmarck M, Liljas A; J Mol Biol 1987;195:555-579. 

[1325] 496. Ribosomal protein L13 signature ^ . ^. , -o *r. 

Ribosomal protein LI 3 is one of the proteins from the large ribosomal subunit. In Eschenchia coli. L13 is known to be 
one of the early assembly proteins of the 50S ribosomal subunit. It belongs to a family of nbosomal proteins which, on 
50 the basis of sequence similarities [1], groups: - Eubacterial L13. 

- Plant chloroplast LI 3 (nuclear-encoded). - Red algal chloroplast LI 3. 

. Archaebacterial LI 3. - Mammalian LI 3a (Tum PI 98). - Yeast Rp22 and Rp23. 

55 [1326] L11 is a protein of 140 to 250 amino-acid residues. As a signature pattem, a conserved region was selected 
innatpd in the C-terminal section of these proteins. „ „ 

[iS? ConsensuT pattern: ELIVMHKRVl-[GKl-M-[LIV]-[PS]-x(4.5)-[^ 

[GDN] 
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[13281 ( 1) Chan Y-L., Olvera J., Glueck A.. Wool I.G. J. Biol. Chem. 269:5589-5594(1994). 
[1329] 497. Ribosomal protein L13e signature 

A number of eukaryotic ribosomal proteins can be grouped on the basis of sequence similarities [1]. One of these 
families consists of: 

5 

Vertebrate L13 (was previously known as Breast Basic Conserved protein 1 (BBC1)). - Drosophila L13. • Plant 
L13. - Yeast probable L13 (YM9375.11c). 

These proteins have 199 to 218 amino-acid residues. As a signature pattern, a stretch of about 16 residues in the first 
10 third of these proteins selected. 

- Consensus pattem: [KR]-Y-x(2)-K-[LIVM]-R-[STA]-G-[KR]-G-F-[ST]-L-x-E 

[1330] [ 1] Olvera J., Wool I.G. Biochem. Biophys. Res. Commun. 201:102-107(1994). 
15 [1331] 498. Ribosomal protein L14 signature 

Ribosomal protein LI 4 is one of the proteins from the large ribosomal subunit. In eubacteria, LI 4 is known to bind 
directly to the 23S rRNA. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1], 
groups: - Eubacterial L14. - Algal and plant chloroplast L14. - Cyanelle L14. - Archaebacterial L14. - Yeast L17A. - 
Mammalian L23. 

20 

Caenorhabditis elegans L23 (B0336.10). - Higher eukaryotes mitochondrial L14. 
Yeast mitochondrial Yml38 (gene MRPL38). 

LI 4 is a protein of 119 to 137 amino-acid residues. As a signature pattem. a conserved region located in the C-terminal 
25 half of these proteins was selected. 

- Consensus pattern: (GA]-[LIVE(3)-x(9.10)-[DNSl-G-x(4)-[FY]-x(2)-[NT|-x(2)-V-[LIV] 

[1332] [ 1] Otaka E.. Hashimoto T., Mizuta K.. Suzuki K. Protein Seq. Data Anal. 5:301-313(1993). 
30 [1333] 499. Ribosomal protein L15 signature 

Ribosomal protein L15 is one of the proteins from the large ribosomal subunit In Escherichia coli, LI 5 is known to bind 
the 23S rRNA. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1]. groups: - 
Eubacterial L15. - Plant chloroplast LI 5 (nuclear-encoded). 

35 . Archaebacterial LI 5. - Vertebrate L27a. - Tetrahymena thermophila L29. 

- Fungi L27a (L29, CRP-1 . CYH2). 

LIS is a protein of 144 to 154 amino-acid residues. As a signature pattern, a conserved region was selected in the C- 
terminal section of these proteins. 

40 

- Consensus pattem: K-[LIVM](2)-[GASL]-x-[GTl-x-[LIVMA]-x(2.5)-[UVM]-x-[LIVMF]-x(3.4)-[LIVMFCAJ-[ST]-x(2)- 
A-x(3)-[LIVM]-x(3)-G 

[1334] [ 1] Otaka E., Hashimoto T., Mizuta K., Suzuki K. Protein Seq. Data Anal. 5:301-313(1993). 
45 [1335] 500. Ribosomal protein L15e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities 
[1]. One of these families consists of: 

- Mammalian L15. - Insect L15. - Plant L15. - Yeast YL10 (L13) (Rp15r). 
50 - Thermoplasma acidophilum LI 5. 

These proteins have about 200 amino acid residues. As a signature pattern, a conserved region was selected located 
in the central section. 

55 . Consensus pattem: (DE]-[KR]-A.R-x-L-G-[FY]-x-[SAP]-x(2)-G-ILIVMFY](4)-R-x-R-[IV)-x-R-G 

[ 1] Zwickl P., Lupas A.. Baumeister W. 

Biochem. Biophys. Res. Commun. 209:684-688(1995). 
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ri3361 501. Ribosomal protein L1 7 signature 

. io^no nf tho nrntPins from the laroo ribosomal subunit. L17 belongs to a family of ribosomal 
proteins which, on the basis of sequence similarities, groups: - Eubactenal Li f, 

• Yeast mitochondrial YmL8 (gene MRPL8). 

Eubactenal L17 is a protein of 120 to 130 amino-acid residues. Yeast YmL8 is twice '^^S^^ (238^^''*"«^>- *f ^^""^ 
of its N-terminal half is colinearwith that of eubactenal L17. As a signature pattern, a conserved region in the N4erm.nal 
section was selected. 

- Consensus pattern: |-x-[ST|-[GTl-x(2)-[KR]-x-K.x(6)-[DE]-x.[LIMV].^ 
ri3371 502. Ribosomal protein L18e signature 

A nurnber of eukaryotic and archaebaaerial ribosomal proteins can be grouped on the basis of sequence similanties. 
One of these families consists of: 

- Vertebrate L18 (known as LI 4 in Xenopus) [1]. - Plant L18. 

- Yeast L18 (Rp28). - Halobacterium marismortui H129. 
Sulfolobus acidocaldarius H 1 29e. 

These proteins have 11 5 to 187 amino-acid residues.. A stretch of about 13 residues in the first third of these proteins 
has been selected as a signature pattern. 

. Consensus pattern: [KREl-x-L-x(2)-[PS]-[KRl-x(2)-[RH]-[PSA]-^ 

1 1] Puder M.. Barnard G,F., Staniunas RJ., Steele G.D. Jr.. Chen L.B. 
Biochim. Biophys. Acta 1216:134-136(1993). 

[1338] 503. Ribosomal L18p family xf^:„„* KJr.H 

t has been shown that the amino terminal 93 amino acids of Swiss:P09895 are necessary and sufficient to bind 5S 
rRN A in vitro The^^ half of the protein, comprising amino acids 1 51-296, serves to localize the prote.n 

to the nucleolus [1]. 
Number of members: 26 
[11 

Medline: 96212235 ^ , ^ ^ 

Distinct domains in ribosomal protein L5 mediate 5 S rRNA binding and nucleolar localization. 

Michael WM. Dreyfuss G; 
J Biol Chem 1996;271:11571-11574, 
[13391 504. Ribosomal protein L19 signature ^ ^ - ^. .■ , ho :^ u««v«n *o 

Ribosomal protein L19 is one of the proteins from the large ribosomal subunit. In Escherichia coli. s kno^^^^^^ 
located at the 30S-50S ribosomal subunit interface and may play a role in the structure and function of the ammoajl- 
tRNA binding site. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities, groups. - 
Eubactenal LI 9. 

- Red algal chloroplast L19. - Cyanelle L19. 
L19 is a protein of 120 to 130 amino-acid residues., 

A conserved region in the C-tenminal section has been selected as a signature pattern. 

- Consensus pattern: ILIVM1-x-[KRGTI]-x-[GSAI1-[KRQDA].[VGHRSN]-X(0.1)^^^^^ 
R 

[1340] 505. Ribosomal protein LI 9e signature ^ ^ . x ^!™,iiori«^e 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similanties. 
One of these families consists of: 



Mammalian ribosomal protein L19 [1]. - Drosophila ribosomal protein L19 [2]. 

Slime mold (D. discoideum) vegetative specific protein V14 [3]. 

Yeast ribosomal protein L19 (YL14). - Archebacterial ribosomal protein L19E. 
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[1341] These proteins have 148 to 203 amino-acid residues. 

A stretch of at>out 20 residues in the N-terminal part of these proteins has been selected as a signature pattern. 

- Consensus pattern: Q-[KRl-R-[LIVM]-x-[SA]-x(4)-ICV]-G-x(3)-[IV]-[WKHLIVF]-[DN]-P 

5 

[ 1] Chan Y.-L., Lin A., McNally J.. Peleg D.. Meyuhas C. Wool I.G. 
J. Biol. Chem. 262:1111-1115(1987).[ 2) Hart K.. Klein T.. Wilcox M. 
Mech. Dev. 43:101-110(1993).! 3] Singleton C.K.. Manning S.S., Ken R. 
Nucleic Acids Res. 17:9679-9692(1989). 
10 [1342] 506. Ribosomal protein Lie signature (RibosomaLL4) 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists [1,2,3. 4] of: - Vertebrate LI (L4). - Drosophila LI. - Plant LI. - Yeast L2 (Rp2). 

Fission yeast L2. - Halobacterium marismortui HmaL4 (HL6). 
15 - Methanococcus jannaschii MJ0177. 

These proteins have 246 (archaebacteria) to 427 (human) amino acids. A conserved region In the N-terminal part of 
these proteins has been selected as a signature pattern. 

20 - Consensus pattem: N-x(3)-[KRM]-x(2)-A-[UVT]-x-S-A-[LIV]-x-A-[ST]-[SGAl-x(7)-[RK]-[GS]-H 

[ 1] Rafti R, Gargiulo G., Manzi A., Malva C, Graziani F. 
Nucleic Acids Res. 1 7:456-456(1 989).[ 2) Presutti C, Villa T, Bozzoni I. 
Nucleic Acids Res. 21:3900-3900(1993). 
25 [ 3] Bagni C, Mariottini P., Annesi F.. Amaldi F. 

Biochim. Biophys. Acta 1216:475-478(1993). 

[ 3] Amdt E.. Kroemer W.. Hatakeyama T J. Biol. Chem. 265:3034-3039(1990). 

[1343] 507. Ribosomal protein L2 signature 
30 Ribosomal protein L2 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L2 is known to bind 
to the 23S rRNA and to have peptidy transferase activity. It belongs to a family of ribosomal proteins which, on the 
basis of sequence similarities [1 ,2], groups: - Eubacterial L2. 

Algal and plant chloroplast L2. - Cyanelle L2. - Archaebacterial L2. 
35 . Plant L2. - Slime mold L2. - Marchantia polymorpha mitochondrial L2. 

Paramecium tetraurelia mitochondrial L2. - Fission yeast K5, K37 and KD4. 

- Yeast YL6. - Vertebrate L8. 

The best conserved region located in the C-terminal section of these proteins has been selected as 
^ a signature pattern. 

- Consensus pattem: P-x(2)-R-G-[STAIV](2)-x-N-[APK]-x-[DE] 

[ 1] Marty I., Meyer Y. 
45 Nucleic Acids Res. 20:1517-1522(1992). 

[ 2] Otaka E., Hashimoto T, Mizuta K.. Suzuki K. 
Protein Seq. Data Anal. 5:301-313(1993). 

[1344] 508. Ribosomal protein L20 signature 
50 Ribosomal protein L20 is one of the proteins from the large ribosomal subunit. In Escherichia coli. L20 is known to bind 
directly to the 23S rRNA. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1], 
groups: - Eubacterial L20. - Algal and plant chloroplast L20. 

Cyanelle L20. 

55 

L20 is a protein of ak>out 120 amino-acid residues. A conserved region located in the central section of these proteins 
has been selected as a signature pattem. 
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- Consensus pattern: K-x(3)-[KRC]-x-[LIVMl-W-[IV]-[STNALVl-R-[LIVM]-[NS]-x(3).[RKHS] 

[ 1] Otaka E., Hashimoto T. Mizuta K., Suzuki K. 

Protein Seq. Data Anal. 5:301-313(1993). 
[13451 509. Ribosomal protein L21e signature w.i«r:««e 
A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

- Mammalian L21 [1]. - Entamoeba histolytica L21 [2]. 

- Caenorhabditis elegans L21 (C14B9.7). - Yeast L21E (URP1) [3]. 
Halobacterium marismortui HL31 [4]. 

These proteins have 160 (eukaryotes) or 95 (archebacteria) amino-acid residues. A conserved region in the central 
part of these proteins has been selected as a signature pattern. 

- Consensus pattern: G-[DE]-x-V-x(10)-[GV]-x(2)-[FYH]-x(2)-[FY]-x-G-x-T-G 

[ 1] Devi K.R.G., Chan Y.-L.. Wool I.G. 
Biochem. Biophys. Res. Commun. 162:364-370(1989). 
[ 2] Fetter R., Rozenblatt S.. Nuchamowitz Y, Mirelman D. 
Mol. Biochem. Parasitol. 56:329-333(1992). 

[ 3] Jank B., Waldherr M., Schweyen R.J. Curr. Genet. 23:15-18(1993). 
[4] Hatakeyama T. Kimura M. Eur. J. Biochem. 172:703-711(1988). 

[1346] 510. Ribosomal protein L21 signature .. . o.. . *^k;«h 

Ribosomal protein L21 Is one of the proteins from the large ribosomal subunit. In Escherichia col. L21 is know" ^o^^^^^ 
to the 23S rRNA in the presence of L20. It belongs to a family of ribosomal proteins which, on the basis of sequence 
similarities, groups: - Eubacterial L21. 

Marchantia polymorpha chloroplast L21 . - Cyanelle L21 . 
Spinach chloroplast L21 (nuclear-encoded). 

Eubacterial L21 is a protein of about 100 amino-acid residues, the mature fomi of the spinach chloroplast L21 has 200 
residues. A conserved region located in the C-terminal section of these proteins has been selected as a signature 
pattern. 

Consensus pattern: [IVT|-x(3)-[KR]-x(3)-[KRQ]-K-x(6)-G-[HFl-R-[RQ]-x(2)-[ST| 

[1347] 511. Ribosomal protein L22 signature .. .^o- i. t^Kj^H 

Ribosomal protein L22 is one of the proteins from the large ribosomal subunit. In Escherichia »^ 
23S rRNA. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1.2.3], groups. - 
Eubacterial L22. 

- Algal and plant chloroplast L22 (in legumes L22 is encoded in the nucleus instead of the chloroplast). - Cyanelle 
L22 - Archaebacterial L22. 

- Mammalian L17. - Plant L17. - Yeast YL17. 

A conserved region located in the C- terminal section of these proteins has been selected as a signature pattern. 
. Consensus pattem:[RKQN]-x(4)-[RH]-[GAS]-x-G-[KRQS]-x(9)-[HDN]-lLIVM]-x-[LIVMS]-x-[LIVM] 

[ 1] Gantt J.S.. Baldauf S.L.. Calie P.J.. Weeden N.F., Palmer J.D. , >,oo/^QQ1^ 

EMBO J. 10:3073-3078(1991).! 2] Madsen L.H.. Kreiberg J.D.. Causing K. Curr. Genet. 19:417-422(1991). 
[ 3] Otaka E., Hashimoto T, Mizuta K., Suzuki K. 
Protein Seq. Data Anal. 5:301-313(1993). 

[13481 512. Ribosomal protein L23 signature .. . «o • . * u 

Ribosomal protein L23 is one of the proteins from the large ribosomal subunit. In Escherichia coli. L23 is known to bind 
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a specific region on the 23S rRNA; in yeast, the corresponding protein binds to a homologous site on the 26S rRNA 
[1]. tt belongs to a family of ribosomal proteins which, on the basis of sequence simitarities [2.3,4]. groups: > Eubacterial 
L23. 

5 - Algal and plant chloroplast L23. - Archaebacterial L23. - Mammalian L23A. 
Caenorhabditis elegans L23A (F55D10.2), - Fungi L25. 
Yeast mitochondrial YmL41 (gene MRPL41 or MRP20). 

[1349] A small conserved region in the C-terminal section of these proteins, which is probably involved in rRNA- 
10 binding has been selected as a signature pattern [2]. 

- Consensus pattern: [RK](2)-[AM]-[IVFYT]-[IV]-[RKT|-L-[STANEQK]-x(7)-[LIVMFT| 

[ 1] El Baradi T.T.A.L., Raue H.A., van de Regt C.H.F.. Verbree E.G.. 
15 Planta R.J. EMBO J. 4:210-2107(1985). 

[ 2] Raue H.A., Otaka E.. Suzuki K. J. Mol. Evol. 28:418-426(1989). 
[ 3] Fearon K.. Mason T.L. J. Biol. Chem. 267:5162-5170(1992). 
[ 4] Otaka E.. Hashimoto T., Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

20 

[1350] 513. Ribosomal protein L24 signature 

Ribosomal protein L24 is one of the proteins from the large ribosomal subunit. L24 belongs to a family of ribosomal 
proteins which, on the basis of sequence similarities, groups: - Eubacterial L24. 

25 . Plant chloroplast L24 (nuclear-encoded). - Red algal L24. - Vertebrate L26. 
. Yeast L26 (YL33), - Archaebacterial HmaL24 (HL15). 

A probable ribosomal protein from Sulfolobus acidocaldarius [1]. 

In their mature form, these proteins have 103 to 150 amino-acid residues. 
30 A conserved stretch of 20 residues in their N-terminal section has been selected as a signature pattern. 

- Consensus pattern: [GDEN]-D-x-V-x-[IV]-[LIVMA]-x-G-x(2)-[KRA]-[GNQ]-x(2,3)-[GAl-x-[IVl 

[ 1] Ouzounis C, Kyrpides N.. Sander C. 
35 Nucleic Acids Res. 23:565-570(1995). 

[1351] 514. Ribosomal protein L24e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists [1] of: 

40 - Mammalian ribosomal protein L24. 

- Yeast ribosomal protein L30A/B (Rp29) (YL21 ). 
Kluyveromyces lactis ribosomal protein L30. 
Arabidopsis thaliana ribosomal protein L24 homolog. 
Haloarcuta marismortul ribosomal protein HL21/HL22. 

45 . Methanococcus jannaschii M J 1 20 1 . 

These proteins have 60 to 160 amino-acid residues. The most conserved region, which is located in the N^erminal 
region of these proteins has been selected as a signature pattem. 

50 - Consensus pattem: [FY]-x-[GSH]-x(2)-[IV]-x-P-G-x-G-x(2)-[FYV)-x-[KRHE]-x-D 

[ 1] Chan Y.-L.. Olvera J., Wool I.G. Biochem. Biophys. Res. Commun. 202:1176-1180(1994). 
[1352] 515. Ribosomal protein L27 signature 

Ribosomal protein L27 is one of the proteins from the large ribosomal subunit. L27 belongs to a family of rit>osomal 
55 proteins which, on the basis of sequence similarities [1 ,2], groups: - Eubacterial L27. 

Plant chloroplast L27 (nuclear-encoded). - Algal chloroplast L27. 
Yeast mitochondrial YmL2 (gene MRPL2 or MRP7). 
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schematic relationship between these groups of proteins is shown below. Eub. 1^7 NxxxxxxxxxAlgal L27 



The 



Plant L27 tttttNxxxxxxxxxxxxx 

Yeast MRP7 tttNxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 
***'f: transit peptide. 

•N': N-terminal of mature protein/*': position of the pattern. 

. Consensus pattern: G-x-[LIVM](2)-x-R-Q-R-G-x(5)-G 

[ 1] Elhag G.A.. Bourque D.P. Biochemistry 31:6856-6864(1992). 
[ 2] Otal^a E., Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

'r^o!:^:^'S:^ .r.e.s .0. ^3.0.3 ana ohiorop.asts. The '^Vr^ne'!:l'rr 
P36525 also contains a region of similarity to prokaryotic L28 proteins. L24 from yeast is also found in the large nbos- 
omal subunit 
Number of members: 24 

ri3541 517. Ribosomal protein L29 signature .:K«o«r«ai 
Ribosomal protein L29 is one of the proteins from the large ribosomal subunit. L29 belongs to a family of nbosomal 
proteins which, on the basis of sequence similarities [1], groups: - Eubacterial L29. - Red algal L29. 

- Archaebacterial L29. - Mammalian L35 - Caenorhabditis elegans L35 (ZK652.4). 

- Yeast L35. 

L29 is a protein of 63 to 138 amino-acid residues. 

A conserved region located in the central section of L29 has been selected as a signature pattern. 

. Consensus pattern: tKNQSHPSTL]-x(2)-[LIMFA]-[KRGSAN]-x.[LIWSTAHKR]-^ 
[KRCQVT1-[LIVMA] 

[ 1] Otaka E., Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

ri3551 518. Ribosomal protein L3 signature .. . o . * 

Sbosomal protein L3 is one of the proteins from the large ribosomal subunit. In Eschench.a coh. '-3 - ^n^^^^^^^^^ 
to the 23S rRNA and may participate in the fomnation of the peptidyltransferase center of the nbosome. It belongs to 
a family of ribosomal proteins which, on the basis of sequence similarities [1,2.3.4]. groups: - Eubactenal L3. - Red 
algal L3. - Cyanelle L3. 

Archaebacterial Halobacterium marismortui HmaL3 (HL1). 

- Yeast L3 (also known as trichodermin resistance protein) (gene TCM1). 

- Arabidopsis thaliana L3 (genes ARP1 and ARP2). - Mammalian L3 (L4). , ^ i« tKo ^ontr«i 

- Mammalian mitochondrial L3. - Yeast mitochondrial YmL9 (gene MRPL9). A conserved region located in the central 
section of these proteins has been selected as a signature pattern. 

- Consensus pattem: [FL]-x(6)-[DN]-x(2)-[AGS]-x-[STl-x-G-[KRH]-G-x(2)-G-x(3)-R 

[ 1] Amdt E.. Kroemer W., Hatakeyama T J. Biol. Chem. 265:3034-3039(1990). 

[ 2] Graack H.-R., Grohmann L., Kitakawa M., Schaefer K.L., Krufl V. 

Eur. J. Biochem. 206:373-380(1992). 

[ 3] Herwig S., Krufl V., Wittmann-Liebold B. 

Eur. J. Biochem. 207:877-885(1992). 

[ 4] Otaka E., Hashimoto T.. Mizuta K.. Suzuki K. 

Protein Seq. Data Anal. 5:301-313(1993). 

ri3561 519. Ribosomal protein L30 signature , 
Ribosomal protein L30 is one of the proteins from the large ribosomal subunit L30 belongs to a family of nbosomal 
proteins which, on the basis of sequence similarities [1], groups: - Eubacterial L30. - Archaebactenal L30. 
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Drosophlla L7. - Slime mold L7. - Mammalian L7. - Fungi L7 (YL8). 
Yeast mitochondrial L33. 

L30 from eubacteria are small proteins of about 60 residues, those from archaebacteria are proteins of about 150 
5 residues. Eukaryotic L7 are proteins of about 250 to 270 residues. The schematic relationship between the three groups 
of proteins is shown below.Eub. L30 NxxxxxxxxxxC 
Arc. L30 NxxxxxxxxxxxxxxxxxxxxxxxxxxxC 

Euk. L7 NxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxC *******•♦': position of the pattern. 
The signature pattern for this family of ribosomal proteins spans the N-terminal half of the region common to all these 
10 proteins. 

- Consensus pattem: [IVTl-[LIVM]-x(2)-[LF]-x-[LIJ-x.[KRHQEG]-x(2)-[STNQH]-x-[IVTl-x(1 0)-[LMSHLIV]-x(2)-[LI- 
VAJ-x(2)-[LMFY]-[IVTl 

15 1 1] Mizuta K., Hashimoto T, Otaka E. 
Nucleic Acids Res. 20:1011-1016(1992). 
[1357] 520. Ribosomal protein L31 signature 

Ribosomal protein L31 is one of the proteins from the large ribosomal subunit. L31 is a protein of 66 to 97 amino^acid 
residues which has only been found so far in eubacteria and in some algal chloroplasts. 
20 A conserved region located in the central section of these proteins has been selected as a signature pattem. 

- Consensus pattem- H-P-F-(FY]-[TI]-x(9)-G-R-[AIV]-x-[KRQ] 

[1358] 521. Ribosomal protein L31e signature 
25 A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

Mammalian L31 [1]. • Chlamydomonas reinhardtii L31. - Yeast L34. 
Halobacterium marismortui HL30 [2]. 

30 

These proteins have 87 to 128 amino-acid residues. 

A conserved region, located in the central section has been selected as a signature pattern. 

- Consensus pattern: V-[KR]-[LIVM]-x(3)-[LIVM]-N-x-[AKH]-x-W-x-[KR]-G 

35 

[ 1] Tanaka T., Kuwano Y, Kuzumaki T., Ishikawa K.. Ogata K. Eur. J. Biochem. 162:45-48(1 987). [ 2] Bergmann U.» 
Amdt E. 

Biochim. Biophys. Acta 1050:56-60(1990). 
[1359] 522. Ribosomal protein L33 signature 
40 Ribosomal protein L33 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L33 has been shown 
to be on the surface of 50S subunit. L33 belongs to a family of ribosomal proteins which, on the basis of sequence 
similarities [1,2,3]. groups: - Eubacterial L33. 

Algal and plant chloroplast L33. - Cyanelle L33. 

45 

L33 is a small protein of 49 to 66 amino-acid residues. A conserved region located in the central section of L33 has 
been selected as a signature pattem. 

- Consensus pattem: Y-x-(ST|-x-[KR]-[NS]-x(4)-IPATQl-x(1 .2)-[U VM]-[EAl-x(2)-K-[FY]-[CSD] 

50 

1 1] Kruft v.. Kapp U., Wittmann-Liebold B. Biochimie 73:855-860(1991). 
[ 2) Sharp P.M. Gene 139:129-130(1994). 
[ 3] Otaka E., Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

55 

[1360] 523. Ribosomal protein L34 signature 

[1361] Ribosomal protein L34 is one of the proteins from the large subunit of the prokaryotic ribosome. It is a small 
basic protein of 44 to 51 amino-acid residues [1]. L34 belongs to a family of ribosomal proteins which, on the basis of 
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sequence similarities, groups: - Eubacterial L34. 

- Red algal chloroplast L34. - Cyanelle L34. 

A conserved region that corresponds to the N^terminal half of L34 has been selected as a signature pattern. 

- Consensus pattern: K-[RG]-T-[FYWL]-[EQS]-x{5)-[KRHS]-x(4.5)-G-F-x{2)-R 

[ 1] Old I.G., Margarita D., Saint Girons I. 
Nucleic Acids Res. 20:6097-6097(1992). 

ri3621 524 Ribosomal protein L34e signature . 

A Sber of eukaryotic and archaebact^al ribosomal proteins can be grouped on the basis of sequence s.m.lanfes. 

One of these families consists of: 

- Mammalian L34. - Mosquito L31 [1]. - Plant L34 [2]. . . oo ioq 

- Yeast putative ribosomal protein YIL052c. - Methanococcus jannaschii MJ0655. These protems have 89 to 129 

amino-acid residues. 

A conserved region located in the N-terminal section of these proteins has been selected as a signature pattern. 

- Consensus pattern: Y-x-[ST]-x-S-[NYl-x(5)-[KR]-T-P-G 

[ 1] Lan Q.. Niu L.L., Fallon A.M. 

Biochim. Biophys. Acta 1218:460-462(1994). 

[ 2] Gao J.. Kim S.R.. Chung Y.Y., Lee J.M.. An G. 

Plant Mol. Biol. 25:761-770(1994). 

ri3631 525 Ribosomal protein L35Ae signature . 

A numbe?of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence s.m.lant,es. 

One of these families consists of: 

- Vertebrate L35A. - Caenorhabditis elegans L35A (F10E7.7). 

- Yeast L37A/L37B (Rp47). - Pyrococcus woesei L35A homolog [1]. 

These oroteins have 87 to 110 amino-acid residues. 

I hThJ^rved stretch of 22 residues in the C-terminal part of these proteins has been selected as a signature 
pattern. 

- Consensus pattem: G-K-[LIVMl-x-R-x-H-G-x(2)-G-x-V-x-A-x-F-x(3)-[LI]-P 

[ 1] Ouzounis C, Kyrpides N., Sander C. 
Nucleic Acids Res. 23:565-570(1995). 

[1364] 526. Ribosomal protein L36 signature k^i^o^o f« ^ famiixy 

kiboimal protein L36 is the smallest protein from the large subunit of the P-"^-^"'- 

of ribosomal proteins which, on the basis of sequence similarities [1], groups: - Eubacterial L36. - ^19^' ^""^ fj^"* 
chloroplast L36 - Cyanelle L36.L36 is a small basic and cysteine^ch protein of 37 am.no^cid residues. As a signature 
panalservedregionthatcorrespondstopositions11to36 

Con2nL?Sem^C-x(2VC-x(2)-[LIVM]-x-R-x(3)-lLIVMN]-x-[LIVM]-x 

[ 1] Otaka E., Hashimoto T.. Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 

ri3651 527. Ribosomal protein L36e signature * ,k»c<^ f=.r,iii=e 

A number of eukaryotic rillosomal proteins can be grouped on the basis of sequence s.m.lanties. One of these families 

consists of: - Mammalian L36 [1]. 

- Drosophila L36 (M(1)1B). - Caenorhabditis elegans L36 (F37C12.4). 

- Candida albicans L39. - Yeast YL39. 

These proteins have 99 to 104 amino acids. 
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A conserved region in the central part of these proteins has been selected as a signature pattern. 

- Consensus pattern: P-Y-E-[KRl-R-x-[LIVM]-[DE]-[LIVM](2)-[KR] 

5 [ 1) Chan Y.-L., Paz V.. Olvera J.. Wool I.G. 

Biochem. Biophys. Res. Commun. 192:849-853(1993). 
[1366] 528. Ribosomal protein L39e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence sinnilarities. 
One of these families consists of: 

10 

Mammalian L39 [1]. - Plants L39. - Yeast L46 [2]. > Archebacterial L39e [3]. These proteins are very basic. About 
50 residues long, they are the smallest proteins of eukaryotic-type ribosomes. A conserved region in the C-terminal 
section of these proteins has been selected as a signature pattern. 

- Consensus pattern: [KRAJ-T-x(3)-[LlVM]-[KRQF]-x-[NHS]-x(3)-R-[NHY]-W-R-R 

15 

1 1] Lin A., McNally J., Wool I.G. J. Biol. Chem. 259:487-490(1984). 
[ 2] Leer R.J,, van Raamsdonk-Duin M.M.C., Kraakman P., Mager W.H., 
Planta R.J. Nucleic Acids Res. 13:701-709(1985). 

[ 3] Ramirez Louie K.A., Matheson A.T. FEBS Lett. 250:416-418(1989). 

20 

[1367] 529. Ribosomal L40e family 

Bovine L40 has been identified as a secondary RNA binding protein [1]. L40 is fused to a ubiquttin protein [2]. 
Number of members: 27 
[1] 

25 Medline: 88203200 

RNA binding proteins of the large subunit of bovine mitochondrial ribosomes. 
Piatyszek MA. Denslow ND. O'Brien TW; 
Nucleic Acids Res 1988;16:2565-2583. 

[2]Medline: 96011832 The carboxyl extensions of two rat ubiquitin fusion proteins are rit)osomal proteins S27a and 
30 L40. 

Chan YL, Suzuki K, Wool IG; 
Biochem Biophys Res Commun 1995;215:682-690. 
[1368] 530. (Ribosomal L44) Ribosomal protein L44e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
35 One of these families consists of: 

Mammalian L44 [1]. - Trypanosoma brucei L44. 

- Caenortiabditis elegans L44 (C09H10.2). - Fungal L44 (L41). 
Halobacterium marismortui LA [2]. 

40 

These proteins have 92 to 105 amino-acid residues. 

A conserved region located in the C-terminal part of these proteins has been selected as a signature pattem. 

- Consensus pattern: K-x-[TV]-K-K-x(2)-L-IKR]-x(2)-C 

45 

1 1] Gallagher M.J., Chan Y.-L., Lin A.. Wool I.G. DNA 7:269-273(1988). 
[ 2] Bergmann U.. Wittmann-Liebold B. 
Biochim. Biophys. Acta 1173:195-200(1993 

50 [1369] 531 . Ribosomal protein L5 signature 

Ribosomal protein L5 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L5 is known to be 
involved in binding 5S RNA to the large ribosomal subunit. It belongs to a family of ribosomal proteins which, on the 
basis of sequence similarities [1,2,3,4], groups: - Eubacterial L5. 

S5 . Algal chloroplast L5. - Cyanelle L5. - Archaebacterial L5. - Mammalian L11. 
Tetrahymena thermophila L21. - Slime mold L5 (V18). - Yeast LI 6 (39A). 
Plants mitochondrial L5. 
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L5 is a protein of about 1 80 amino-acid residues. 

A ccp.Gcr/ed region, located In fhfi firRt third of these proteins has been selected as a signature pattern. 

- Consensus pattern: [LIVIVIhx(2)-[LIVM].[STAVC]-[GEHQV]-x(2H^ 

[ 1] Hatakeyama T, Hatakeyama T. Biochim. Biophys. Acta 1039:343-347(1990). 
[ 2] Rosendahl G.. Andreasen P.H.. Khstiansen K. Gene 98:161-167(1991) ««omqqia 
[ 31 Yang D.. Gunther I., Matheson A.T.. Auer J.. Spicker G., Boeck A. Biochimie 73:679-682(1991). 
[ 41 Otaka E.. Hashimoto T. Mizuta K.. Suzuki K. Protein Seq. Data Anal. 5:301-313(1993). 

[1370] 532. ribosomal L5P family C-tenminus 

[1 371] This region is found associated with RibosomaLL5. Number of members: 60 

[13721 533. Ribosomal protein L6 signatures ^ ^ . ^- .• i e u„^..,ri 

373 Ribosomal protein L6 is one of the proteins from the large ribosomal subun t In Eschench.a c^^^^^^^ ^known 
to bind directly to the 23S rRNA and is located at the aminoacyl-tRNA binding site of the pept.dyltransferase cente . It 
belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1 .2,3,4], groups: - Eubactenal L6. 

Algal chloroptast L6. 
Cyanelle L6. 
Archaebacteriai L6. 

Marchantia polymorpha mitochondrial L6. 

- Yeast mitochondrial YmL6 (gene MRPL6). 
Mammalian L9. 

Drosophila L9. 

- Plants L9. 

- Yeast L9(YL11). 

[1374] While all the above proteins are evolutionary related It is very difficult to derive a P^tte^^/^at will find them 
all Two patterns were therefore created, the first to detect eubacterial. cyanelle and mitochondrial L6. the second to 
detect archaebacteriai L6 as well as eukaryotic L9. 

- Consensus pattern: [PS]-[DENS]-x-Y-K-[GA]-K-G-[LlVM] 

- consensus pattern: Q.x(3)-[LIVM]-x(2HKRl-x(2)-R-x-F-xO-G-[LIVM]-Y-(LIVMl-x(2)-[K^ 
[11 Suzuki K., Olvera J., Wool I.G. Gene 93:297-300(1990). 

[21 Schwank S.. Harrer R.. Schueller H.-J.. Schweizer E. Curr. Genet. 24:136-140(1993). 

[31 Golden B.L.. Ramakrishnan V.. White S.W. EMBO J. 12:4901-4908(1993). 

[4] Otaka E.. Hashimoto T.. Mizuta K.. Suzuki K. Protein Seq. Data Anal. 5:301-313(1993). 

ri3751 534. Ribosomal protein L6e signature . ^-^-..^m^ 

A number of eukaryotic and archaebacteriai ribosomal proteins can be grouped on the basis of sequence s.m.larrt.GS. 

One of these families consists of: 

- Mammalian ribosomal protein L6 (L6 was previously known as TAX-responsive enhancer element binding protein 
107). 

Caenortiabditis elegans ribosomal protein L6 (R151 .3). 
Yeast ribosomal protein YL16AA^L16B. 

- Mesembryanthemum crystallinum ribosomal protein YL16-like. 

These proteins have 175 (yeast) to 287 (mammalian) amino acids. A highly conserved region in the central part of 
these proteins has been selected as a signature pattem. 

- Consensus pattem: N-x(2)-P-L-R-R-x(4)-[FY]-V-l-A-T-S-x-K 

[1376] 535. Ribosomal protein L7Ae signature ^ ^ . * ^ ^;,«;iori*iac 

k number of eukaryotic and archaebacteriai ribosomal proteins can be grouped on the basis of sequence s.m.lanties. 

One of these families consists of: 
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- Vertebrate L7A (SURF3) [1]. - Plant L7A. - Yeast L7A (YL5) (Rp6). 
Yeast protein NHP2 [2]. > Yeast hypothetical protein YEL026w. 

Bacillus subtilis hypothetical protein ylxQ. - Halobacterium marismortui Hs6. 
Methanococcus jannaschii MJ 1 203. 

5 

[1377] These proteins have 100 to 265 amino-acid residues. 

A conserved region located in the central section has been selected as a signature pattern. 

- Consensus pattern: [CA]-x(4)-[IV]-P-(FY]-x(2)-[LIVM]-x-[GSQ]-[KRQ]-x(2)-L-G 

10 

[ 1] Colombo R. Yon J.. Garson K.. Fried M. Proc. Natl. Acad. Sci. U.S.A. 89:6358-6362(1992). 
[ 2] Kolodrubetz D.. Burgum A. Yeast 7:79-90(1991). 

[1378] 536. Ribosomal protein L9 signature 
15 Ribosomal protein L9 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L9 is known to bind 
directly to the 23S rRNA. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1 .2], 
groups: - Eubacterial L9. - Cyanobacterial L9. 

Plant chloroplast L9 (nuclear-encoded). - Red algal chloroplast L9. 

20 

A conserved region, located in the N-terminal section of these proteins has been selected as a signature pattern. 

- Consensus pattern: G-x(2)-[GN]-x(4)-V-x(2)-G-[FY]-x(2)-N-[FY]-L-x(5)-[GA]-x(3)-[STN] 

25 1 1] Hoffman D.W.. Davies C, Gerchman S.E., Kycia J.H., Porter S.J., White S.W., Ramakrishnan V. EMBO J. 13: 

205-212(1994). 

[ 2] Otaka E.. Hashimoto T., Mizuta K., Suzuki K. Protein Seq. Data Anal. 5:301-313(1993). 

[1379] 537. Ribosomal protein S10 signature 
30 Ribosomal protein S10 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S10 is known to be 
involved in binding tRNA to the ribosomes. It belongs to a family of ribosomal proteins which, on the basis of sequence 
similarities [1], groups: - Eubacterial S10. 

Algal chloroplast S10. - Cyanelle S10. - Archaebacterial S10. 
35 - Marchantia polymorpha and Prototheca wickerhamii mitochondrial S10. 

Arabidopsis thaliana mitochondrial S10 (nuclear encoded). - Vertebrate S20. 

- Plant S20. - Yeast URP2. 

SI 0 is a protein of about 100 amino-acid residues. 
40 [1380] A conserved region located in the center of these proteins has been selected as a signature pattern. 

- Consensus pattern: [AV]-x(3)-[GDNSRl-[LIVMSTA]-x(3)-G-P-[LIVM]-x-[LIVMl-P-T 

[ 1] Otaka E., Hashimoto T., Mizuta K. 
45 Protein Seq. Data Anal. 5:285-300(1993). 

[1381] 538. Ribosomal protein S11 signature 

Ribosomal protein S11 [1] plays an essential role in selecting the correct tRNA in protein biosynthesis. It is located on 
the large lobe of the small ribosomal subunit. S11 belongs to a family of ribosomal proteins which, on the basis of 
sequence similarities, groups [2]: - Eubacterial S11. 

50 

Algal ana plant chloroplast S11. - Cyanelle S11 . - Archaebacterial S11 . 
Marchantia polymorpha and Prototheca wickerhamii mitochondrial S11. 

Acanthamoeba castellanii mitochondrial S11 . - Neurospora crassa S14 (crp-2). - Yeast S14 (RP59 or CRY1). 
Mammalian, Drosophila, Trypanosoma, and plant S14. 
55 . Caenorhabditis elegans S14 (F37C12.9). 

One of the best conserved regions in these proteins was selected as a signature pattern. 



203 



EP 1 033 405 A2 

. Consensuspattern:[LIVMF]-x-[GSTAC]4LIVMn-x(2HGSTAL]-x(0J)-lGSNl-ILIVMF]-x-[LIVM^^^ 
v-rDAi.r.QTr.Hi-rnNi 

[ 1] Kimura M., Kimura J.. HatakeyamaT. FEBS Lett. 240:15-20(1988). 
[ 2] Otaka E.. Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[1382] 539. Ribosomal protein SI 2 signature .. ^.-.o- i *^k^ 

Ribosomal protein S12 is one of the proteins from the small ribosomal subunit. In Eschench.a col^^. S12.s known to be 
involved in the translation Initiation step. It is a very basic protein of 120 to 150 amino-ac.d '^^^^.^f^^f^^^^ 
a family of ribosomal proteins which, on the basis of sequence similarities [1]. groups: - Eubacterial SI 2. - Archaebac- 
terial SI 2. 

- Algal and plant chloroplast S12. - Cyanelle 812. 
Protozoa and plant mitochondrial S12. - Yeast S28. 

- Drosophila mitochondrial protein tko (Technical KnockOut). - Mammalian S23. The best conserved regions in these 
proteins, located in the center of each sequence have been selected as a signature pattern. 

- Consensus pattern: [RK]-x-P-N-S-IAR]-x-R 

[ 1] Otaka E., Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[1383] 540. Ribosomal protein SI 2e signature * *u^c« for«iiioc 

A number of eukaryotic ribosomal proteins can be grouped on the basis of sequence similanties. One of these families 
consists of: - Vertebrate 812 [1]. 

- Trypanosoma brucei 812 [2]. - Caenorhabditis elegans 812 (F54E7.2). 

- Drosophila 812. - Yeast 812. 

These proteins have 130 to 150 amino acids. 

A conserved region in the N-terminal part of these proteins has been selected as a signature pattern. 

- Consensus pattern: A-L-[KRQP]-x-V-L-x(2)-[SA]-x(3)-[DN]-G-L 

J.^Bkl" Chem^.T62ri4^^^^^ Marchal C, Ismaili N.. Pays E. Mol. Biochem. Parasitol. 57:331-334(1993). 

[13841 541. Ribosomal protein 813 signature i. ♦ 

Ribosomal protein 813 is one of the proteins from the small ribosomal subunit. In Escherichia co . SI ^ 's known to^^^ 
involved in binding fMet-tRNA and, hence, in the initiation of translation. It is a basic protein of 115 to f "^'"^^^^^ 
residues and belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1,2], groups. - 
Eubacterial 813. 

- Plant chloroplast 813 (nuclear encoded). - Red algal chloroplast 813. 

- Cyanelle 813. - Archaebacterial 813. - Plant mitochondrial 813. 
Mammalian and plant 818. 

The best conserved regions in these proteins, located in their C-terminal part have been selected as a signature pattern. 

- Consensus pattern: [KRQS]-G-x-R-H-x(2).[GSNHl-x(2)-[LIVMC]-R-G-Q 
[ 1] Chan Y.-L.. Paz V.. Wool l.G. 

Biochem. Brophys. Res. Commun. 178:1212-1218(1991). 
[ 2] Otaka E.. Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[1385] 542. Ribosomal protein 814p/S29e (Ribosomal protein 814 signature) . ^ ^ - u- r o-.>, io 

1386] Ribosomal protein 814 is one of the proteins from the small ribosomal subunit. In Eschenchia colt. 814 is 
known to be required for the assembly of 308 particles and may also be responsible for determining the conformation 
of IBS rRNA at the A site. It belongs to a family of ribosomal proteins which, on the basis of sequence similanties [1 .2]. 
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groups: 

Eubacterial SI 4. 

Algal and plant chloroplast S14. 
5 - Cyanelle S14. 

Archaebacterial Methanococcus vannielil S14. 

Plant mitochondrial SI 4. 

Yeast mitochondrial MRP2. 

Mammalian S29. 
10 - Yeast YS29A/B. 

[1387] S14 is a protein of 53 to 115 amino-acid residues. Our signature pattern is based on the few conserved 
positions located in the center of these proteins. 

[1388] Consensus pattern: [RP]-x(0,1)-C-x(11 ,12)-[LIVMF]-x-[LIVMF]-(SCHRG]-x(3)-[RN] 

15 

[1] Chan Y.-L.. Suzuki K„ Olvera J.. Wool I.G. Nucleic Acids Res. 21:649-655(1993). 
[2] Otaka E.. Hashimoto T.. Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 

[1389] 543. Ribosomal protein S15 signature 
20 Ribosomal protein S15 is one of the proteins from the small ribosomal subunit. In Escherichia coli, this protein binds 
to 16S ribosomal RNA and functions at eariy steps in ribosome assembly. It belongs to a family of ribosomal proteins 
which, on the basis of sequence similarities [1,2]. groups: - Eubacterial S15. 

Archaebacterial Haiobacterium marismortui HmaS15 (HS11). 
25 - Plant chloroplast SI 5. - Yeast mitochondrial S28. - Mammalian SI 3. 

Brugia pahangi and Wuchereria bancrofti SI 3 (SI 5). - Yeast SI 3 (YS15). 

SI 5 is a protein of 80 to 250 amino-acid residues. 

A conserved region located in the C-terminal part of these proteins has been selected as a signature pattern. 

30 

- Consensus pattern: [LiVM]-x(2)-H-[LIVMFY]-x(5)-D-x{2)-[SAGNl-x(3)-[LF].x(9).[LiVM]-x(2)-[FY] 

[ 1] Dang H.. Ellis S.R. 
Nucleic Acids Res. 18:6895>6901(1990). 
35 [ 2] Otaka E., Hashimoto T. Mizuta K. 

Protein Seq. Data Anal. 5:285-300(1993). 

[1390] 544. Ribosomal protein S16 signature 

[1391] Ribosomal protein S16 is one of the proteins from the small ribosomal subunit. It belongs to a family of ribos- 
40 omal proteins which, on the basis of sequence similarities [1], groups: 

Eubacterial S16. 

Algal and plant chloroplast S 1 6. 

Cyanelle S16. 

45 . Neurospora crassa mitochondrial S24 (cyt-21). 

[1392] S16 is a protein of about 100 amino-acid residues. A conserved region located in the N-terminal extremity of 
these proteins has been selected as a signature pattern. 
[1 393] Consensus pattern: [LIVMTl-x-[LIVM]-[KR]-L-[STAK]-R-x-G-[AKR] 
50 [1394] [1] Otaka E.. Hashimoto T.. Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 
[1395] 545. Ribosomal protein SI 7 signature 

Ribosomal protein S17 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S17 is known to 
bind specifically to the 5'end of 16S ribosomal RNA and is thought to be involved in the recognition of termination 
codons. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1.2,3], groups: - 
55 Eubacterial SI 7. 

Plant chloroplast SI 7 (nuclear encoded). - Red algal chloroplast S17. 
Cyanelle SI 7. - Archaebacterial SI 7. - Mammalian and plant cytoplasmic S11 . 
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- Yeast S18a and S18b (RP41; YS12). 

The best conserved regions located in the C-termlnal sections of these proteins have been selected as a signature 
pattern. 

- Consensus pattern: G-D-x-[LIV]-x-[LIVA]-x-[QEKl-x-[RK]-P-[LIV]-S 

[ 1] Gantt J.S.. Thompson M.D. J. Biol. Chem. 265:2763-2767(1990). 
[ 2] Herfurth E., Hirano H., Wittmann-Liebold B. 
Biol. Chem. Hoppe-Seyler 372:955-961(1991). 
[ 3] Otaka E.. Hashimoto T.. Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[13961 546. Ribosomal protein SI 7e signature . „ ... 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similanties. 
One of these families consists of: 

- Vertebrates S17[1].-DrosophilaS17 [2]. -NeurosporacrassaS17(crp-3). . ^ , 

- Yeast S17a (RP51A) and S17b (RP51B) [3]. - Methanococcus jannaschii MJ0245. These proteins have from 63 
(in archebacteria) to 130 to 146 amino acids and are highly conserved. A region in the central part of these proteins 
has been selected as a signature. 

. Consensus partem: A-x-I-x.[ST|-K-x-L-R-N-[KR]-l-A-G-[FY]-x-T-H 

[ 1] Chen l.-T, Roufa D.J. Gene 70:107-116(1988). 
[ 2] Maki C. Rhoads D.D., Stewart IVI.J., van Slyke B., Denell R.E., 
Roufa D.J. Gene 79:289-298(1 989). [ 3] Abovich N.. Rosbash M. 
Mol. Cell. Biol. 4:1871-1879(1984). 

[13971 547. Ribosomal protein SI 8 signature . ^ ^ - u- i- ohq K^^^r, 

Ribosomal protein SI 8 is one of the proteins from the small ribosomal subunit. In Escherichia coli. S18 has been 
involved in aminoacyl-tRNA bindingH]. It appears to be situated at the tRNA A-site of the nboson^e. ^ be to"^^^^^^ 
family of ribosomal proteins which, on the basis of sequence similarities[2]. groups: - Eubactenal S18. - Algal and plant 
chloroplast S18. - Cyanelle S18.As a signature pattem. a consen/ed region in the centra! section of the protein has 
been selected This region contains two basic residues which may be involved in RNA-binding.- 
SnsenS^^p^^^^^^ [S-n-[DERP]-x-[GYl.K-[UVM]-x(3)-R. 

l^TrOougall J.. Choli T, Kmft V.. Kapp U.. Wittmann-Liebold B. FEBS Lett. 245:253-260(1 989). [ 2] Otaka E.. Hash- 
imoto T, Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 

[13981 548. Ribosomal protein SI 9 signature r^-o ■ 

Ribosomal protein S19 is one of the proteins from the small ribosomal subunit. In Escherichia coli 319 is known to 
form a complex with S13 that binds strongly to 16S ribosomal RNA. S19 belongs to a family of nbosomal proteins 
which, on the basis of sequence similarities [1 ,2], groups: - Eubacterial S19. 

- Algal and plant chloroplast 819. - Cyanelle SI 9. - Archaebacterial SI 9. 

- Plant mitochondrial SI 9. - Eukaryotic SI 5 ('rig' protein). 

S19 is a protein of 88 to 144 amino-acid residues. Our signature pattem is based on the few conserved positions 
located in the C-terminal section of these proteins. 

- Consensus pattem: [STDNQl-G-[KRQM]-x(6)-[LIVM]-x(4)-[LIVM]-[GSDl-x(2)-[Ln-[GAS]-[DE]-F-x(2)-[^^ 

[ 1] Kitagawa M., Takasawa S.. Kikuchi N.. Itoh T, Teraoka H.. Yamamoto H.. Okamoto H. FEBS Lett. 283:21 0-214 
(1991). 

[ 2] Otaka E., Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[1399] 549. Ribosomal protein SI 9e signature ^ ^ . r ^ ™n^rj««« 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similanties 
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[1.2]. One of these families consists of: - Mammalian SI 9. - Drosophila S19. 

- Ascaris lumbricotdes S19g (ALEP-1) and S19s. - Yeast YS16 (RP55A and RP55B). 
Aspergillus S16. - Halobacterium marismortui HS12. 

5 

These proteins have 143 to 155 amino acids. 

A well conserved stretch of 20 residues in the C-terminal part of these proteins has been selected as a signature pattern. 

- Consensus pattern: P-x(6)-ISAN)-x(2)-[LIVMA]-x-R-x-[ALIV]-[LV]-Q-x-L-[EQ] 

10 

1 1] Etter A.. Aboutanos M.. Tobler H., Mueller F. 

Proc. Natl. Acad. Sci. U.S.A. 88:1593-1596(1991). 

[ 2] Suzuki K., Olvera J.. Wool I.G. Btochimie 72:299-302(1990). 

15 [1400] 550. Ribosomal protein 82 signatures 

Ribosomal protein S2 is one of the proteins from the small ribosomal subunit. 82 belongs to a family of ribosomal 
proteins which, on the basis of sequence similarities [1.2], groups: - Eubacterial S2. - Algal and plant chloroplast S2. 

Cyanelle S2. • Archaebacterial S2. 
20 . Higher eukaryotes P40 (previously thought to be a laminin receptor). 
Yeast NAB1. - Plant mitochondrial 82. - Yeast mitochondrial MRP4. 

82 is a protein of 235 to 394 amino-acid residues. 

Two conserved regions have been selected as signature patterns. One is located in the N-terminal section and the 
25 other in the central section. 

- Consensus pattem: [UVMFA]-x(2)-[LIVMFYC](2)-x-[8TACJ-[G8TANQEKR]-[STALV]- 
[HY]-[LIVMF]-G 

- Consensus pattem: P-x(2)-[LIVMF](2)-tLIVMS]-x-[GDN]-x(3)-[DENL]-x(3)-[LIVM]-x-E-x(4)-[GNQKRH]-(LIVM]- 
30 [AP] 

[ 1] Davis 8.C.. Tzagoloff A.. Ellis 8.R. 
J. Biol. Chem. 267:5508-5514(1992). 

[ 2] Tohgo A., Takasawa S.. Munakata H.. Yonekura H., Hayashi N.. Okamoto H. FEBS Lett. 340:133-138(1994). 

35 

[1401] 551. Ribosomal protein 821 signature 

[1402] Ribosomal protein 821 is one of the proteins from the small ribosomal subunit. 8o far 821 has only been 
found in eubacteria. It is a protein of 55 to 70 amino-acid residues. A conserved region in the N-tenninal section of the 
protein has been selected as a signature pattern. 
40 [1403] Consensus pattern: [DE]-x-A-[LIY]-[KR]-R-F-K-[KR]-x(3)-[KRl 
[1404] 552. Ribosomal protein 821 e signature 

A number of eukaryotic ribosomal proteins can be grouped on the basis of sequence similarities. One of these families 
consists of: - Mammalian 821 [1]. 

45 - Caenorhabditis elegans 821 (F37C12.11). - Rice 821 [2]. 

- Yeast 821 (Ys25) [3]. - Fission yeast 828 [4]. 

These proteins have 82 to 87 amino acids. 

A perfectly conserved nonapeptide in the N-terminal part of these proteins has been selected as a signature pattem. 

50 

Consensus pattem: L-Y-V-P-R-K-C-8-[8A] 

[ 1] Bhat K.8., Mon-ison 8.G. Nucleic Acids Res. 21:2939-2939(1993). 
[ 2] Nishi R., Hashimoto H., Uchimiya H., Kato A. 
55 Biochim. Biophys. Acta 1216:113-114(1993).[ 3] Suzuki K.. Otaka E. Nucleic Acids Res. 1 6:6223-6223(1 988).[ 4] 

Itoh T, Okata E., Matsui K.A. Biochemistry 24:7418-7423(1985). 

[1405] 553. Ribosomal protein S24e signature 
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A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 

r\r*Q «f fkioea familjoc cnnsifitR of: 

- Vertebrate S24 [1]. - Yeast Rp50. - Mucor racemosus S24 [2]. 

. Halobacterium marismortui HS1 5 [3]. - Methanococcus jannaschii MJ0394. 
These proteins have 101 to 148 amino acids. 

A well conserved stretch in the central part of these proteins has been selected as a signature pattern. 

- Consensus pattern: [FYA]-G-x(2HKR]-[STAl.x-G-[FY]-[GA]-x-[LIVM].Y-[DN]-[SDNl 

1 1] Brown S.J.. Jewell A., Maki CO., Roufa D.J. Gene 91:293-296(1990). 
[ 2] Sosa L., Fonzi W.A., Sypherd RS. 

[1406] Nucleic Acids Res. 17:9319^9331(1989).[ 3] Kimura J.. Amdt E.. Kimura M. FEBS Lett. 224.65-70(1987). 

[1407] 554. Ribosomal protein S26e signature ^ « ^* ♦k*.o^ famiiioQ 

k number of eukaryotic ribosomal proteins can be grouped on the basis of sequence s.m.lanties. One of these families 

consists of: - Mammalian S26 [1]. 

- Octopus S26 [2]. - Drosophila S26 (DS31) [3]. - Plant cytoplasmic S26. 

- Fungi S26 [4]. 

These proteins have 114 to 127 amino acids. 

A conserved octapeptide in the central part of these proteins has been selected as a signature pattern. 
. Consensus pattern: [YH]-C-V-S-C-A-I-H 

[ 1] Kuwano Y, Nakanishi O.. Nabeshima Y, Tanaka T.. Ogata K. J. Biochem. 97:983-992(1 985).[ 2] ZinoVeva R. 
n Tomarev SI Dokl Akad. Nauk SSSR 304:464-469(1989). , 
t^i norN Ohi K Ohta M., Kawasaki T. Yamashina I. Nucleic Acids Res. 17.2121.2121(1989).[4] Wu M.. Tan 
H. Gene 150:401-402(1994). 

[14081 555. Ribosomal protein S28e signature •«:i..ri«^c 
X number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the bas.s of sequence s.m.lant,es. 

One of these families consists of: 

- Mammalian S28 [1]. - Plant S28 [2]. - Fungi 833 [3]. 
Methanococcus jannaschii M J 1 202. 

Thpse Droteins have from 64 to 78 amino acids. • * ^ 

I hTghly ^nsirved nonapeptide from the Ctemninal extremity of these proteins has been selected as a signature 

pattern. 

- Consensus pattern: E-[ST]-E-R-E-A-R-x-L 

[ 1] Chan Y.-L., Olvera J., Wool I.G. 

Biochem. Biophys. Res. Commun. 179:314-318(1991). 

[ 2] Hwang I., Goodman H.M. Plant Physiol. 102:1357-1358(1993). 

[ si Hoekstra R.. Ferreira RM.. Bootsman TC. Mager W.H.. Planta R.J. Yeast 8:949-959(1992). 

[1409] 556. Ribosomal protein S3Ae signature ^ ^ . , oi«,.ioritio« 

X mlmber of eukaryotic and archaebacterial ribosomal proteins can be grouped on the bas.s of sequence s.m.lanties. 
One of these families consists of. 

- Mammalian S3A (was originally known as v-fos transfomiatlon effector protein). - Caenorhabditis elegans S3A 
(F56F3.5). 

- Plant cytoplasmic S3A (CYC07) [1]. - Yeast RpIO (PLC1 and PLC2). 

- Fission yeast RpIO (SpAC13G6.02c). - Methanococcus jannaschii MJ0980. 
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These proteins have from 220 to 250 amino acids, 

A conserved stretch in their N-terminal section was selected as a signature pattern. 

- Consensus pattern: [UVl-x-[GH]-R-[IV]-x-E-x-[SC]-L-x-D-L 

5 

[ 1] Liu J.H., Reid D.M. 

Plant Physiol. 109:338-338(1995). 

[1410] 557, Ribosomal protein S3 signature 

Ribosomal protein S3 is one of the proteins from the small ribosomal subunit. In Escherichia coli. S3 is known to be 
10 involved in the binding of initiator Met-tRNA. It belongs to a family of ribosomal proteins which, on the basis of sequence 
similarities [1], groups: - Eubacterial S3. 

Algal and plant chloroplast S3. - Cyanelle S3. • Archaebacterial S3. 
Plant mitochondrial S3. - Vertebrate S3. - Insect S3. 
15 . Caenorhabditis elegans S3 (C23G10.3). - Yeast S3 (Rp1 3). 

53 is a protein of 209 to 559 amino-acid residues, 

A conserved region located in the C-terminal section has been selected as a signature pattern. 

20 . Consensus pattern: [GSTA]-[KR]-x(6)-G-x-[LIVMT]-x(2)-[NQSCH]-x(1 ,3)-[LIVFCA]-x(3)-[LIV]-[DENQ]-x(7)-[LMT|- 
x(2)-G-x(2)-G 

[ 1] Otaka E., Hashimoto T, Mizuta K. 
Protein Seq. Data Anal, 5:285-300(1993). 
25 [1411] 558. Ribosomal protein S4 signature 

Ribosomal protein S4 is one of the proteins from the small ribosomal subunit. In Escherichia cofi. S4 is known to bind 
directly to 16S ribosomal RNA. Mutations in S4 have been shown to increase translational error fi-equencies. It belongs 
to a family of ribosomal proteins which, on the basis of sequence similarities [1.2]. groups: - Eubacterial S4. - Algat 
and plant chloroplast S4. 

30 

Cyanelle S4. - Archaebacterial S4. - Mammalian S9. - Yeast YS11 (SUP45). 
Marchantia polymorpha mitochondrial S4. - Dictyostelium discoideum rp1024. 

Yeast protein NAM9 [3], NAM9 has been characterized as a suppressor for ochre mutations in mitochondrial DNA. 
It could be a ribosomal protein that acts as a suppressor by decreasing translation accuracy. 

35 

54 is a protein of 171 to 205 amino-acid residues (except for NAM9 which is much larger). The signature pattern for 
this protein is based on a conserved region located in the central section of these proteins. 

- Consensus pattern: [LIVM]-[DE]-x-R-[LI]-x(3)-[LIVMC]-[VMFYHQ]-[KRTl-x(3)-ISTAGCVF]-x-[ST]-x(3)-[SAI]-[KR]- 
40 x-[LIVMF](2) 

[ 1] Mizuta K.. Hashimoto T, Suzuki K.I.. Otaka E. Nucleic Acids Res. 19:2603-2608(1991). 
[ 2] Otaka E.. Hashimoto T. Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 

[ 3] Boguta M., Dmochowska A., Borsuk P., Wrobel K.. Gargouri A., Lazowska J., Slonimski P., Szczesniak B.. 
45 Kruszewska A. Mol. Cell. Biol. 12:402-412(1992). 

[1412] 559. Ribosomal protein S4e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

50 

Mammalian S4 [1]. Two highly simitar isoforms of this protein exist : one coded by a gene on chromosome Y, and 
the other on chromosome X. 

Plant cytoplasmic S4 [2] - Yeast S7 (YS6). - Archebacterial S4e. 

55 These proteins have 233 to 264 amino acids. 

A highly conserved stretch of 15 residues in their N-terminal section has been selected as a signature pattern. Four 
positions In this region are positively charged residues. 
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- Consensus pattern: H-x-K-R-[LIVMF]-[SANK]-x-P-x{2)-[WY]-x-[LIVM]-x-[KRP] 

[ 1] Fisher E.M., Beer-Romero P., Brown L.G.. Ridley A.. McNeil J.A.. Lawrence J.B., Wiliard H.F.. Bieber F.R., 
Page D.C. Cell 63:1205-1218(1990). 

[ 2] Braun H.R, Emmermann M., Mentzel H,. Schmitz U.K. Biochim. Biophys. Acta 1218:435-438(1994). 

[14131 560. Ribosomal protein S5 signature .. . , * u-. 

Ribosomal protein S5 is one of the proteins from the small ribosomal subunit. In Escherichia coh, S5 is known to be 
important in the assembly and function of the 30S ribosomal subunit. Mutations in S5 have been shown to increase 
translational error frequencies. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities 
[1 .2], groups: - Eubacterial S5. 

- Cyanelle S5. - Red algal chloroplast S5. - Archaebacterial S5. 

- Mammalian S2 (LLrep3). - Caenorhabditis elegans S2 (C49H3.11). 

- Drosophila S2. - Plant S2. - Yeast S4 (SUP44). - Fungi mitochondrial S5. 

55 is a protein of 1 66 to 254 amino-acid residues. The signature pattern for this protein is based on a conserved region, 
rich in glycine residues, and located in the N-temiinal section of these proteins. 

- Consensus pattem: G-[KRQ]-x(3)-[FYl-x-tACV]-x(2)-[LIVMA]-[LIVMl-[AG].[DN]-x(2)-G-x-[LIVMl-G-x^ 
(5,6)-[DEQ]-[LIVMA]-x(2)-A-[LIVMF] 

[ 1] AI!-Robyn J.A., Brown N.. Otaka E., Liebman S.W. , = ooc -annMOQ-^x 

MoL Cell. Biol. 1 0:6544-6553(1 990).[ 2] Otaka E., Hashimoto T. Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 
[14141 561. Ribosomal protein S6 signature . . . * u ,i 

Ribosomal protein S6 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S6 is known to bind 
together with SI 8 to 16S ribosomal RNA. It belongs to a family of ribosomal proteins which, on the basis of sequence 
similarities, groups: - Eubacterial S6. - Red algal chloroplast S6. 

Cyanelle S6. 

56 is a protein of 95 to 208 amino-acid residues. The signature pattern for this protein is based on a conserved region 
located in the N-terminal section of these proteins. 

- Consensus pattem: G-x-[KRC]-[DENQRH]-L-[SA]-Y-x-l-[KRNSA] 

[1415] 562. Ribosomal protein S6e signature . 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 

One of these families consists of: 

- Mammalian S6 [1]. - Drosophila S6 [2]. - Plant S6 [3]. - Yeast S10 (YS4). ^ , , , 

- Halobacterium marismortui HS13 [4]. - Methanococcus jannaschii MJ1260. S6 is the major substrate of protein 
kinases in eukaryotic ribosomes [5]; it may have an important role In controlling cell growth and proliferation through 
the selective translation of particular classes of mRNA. 

These proteins have 135 to 249 amino acids. , 

A conserved stretch of 12 residues in the N-tenninal part of these proteins has been selected as a signature pattem. 

- Consensus pattem: [LIVM]-[STAMR]-G-G-x-D-x(2)-G-x-P-M 

r 11 Franco R Rosenfeld M.G. J. Biol. Chem. 265:4321-4325(1990). 

[ 21 Watson K.L.. Konrad K.D.. Woods D.F.. Bryant P.J. Proc. Natl. Acad. Sci. U.S.A. 89:11302-11306(1992). 
[ 3] Hansen G.. Estruch J.J., Spena A. Nucleic Acids Res. 20:5230-5230(1992). 

[ 4] Kimura M . Arndt E.. Hatakeyama T. Hatakeyama T. Kimura J. Can. J. Microbiol. 35:195-199(1989). 
[ 5] Bandi H.R.. Fen-ari S., Krieg J.. Meyer H.E.. Thomas G. J. Biol. Chem. 268:4530-4533(1993). 

[14161 563. Ribosomal protein S7 signature .. . . ♦ 

Ribosomal protein 87 is one of the proteins from the small ribosomal subunit. In Eschenchta coli. 87 is known to bind 
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directly to part of the 3*end of 16S ribosomal RNA. It belongs to a family of ribosomal proteins which, on the basis of 
sequence similarities [1 ,2,3], groups: - Eubacterial S7. 

Algal and plant chloroplast S7. - Cyanelle S7. - Archaebacterial S7. 
5 - Plant mitochondrial S7. - Mammalian S5. - Plant S5. 
Caenorhabditis elegans S5 (T05E 11.1). 

The best conserved region located in the N-terminal section of these proteins has been selected as a signature pattern. 

10 . Consensus pattern: [DENSKl-x-[LIVMDE71-x(3HLIVMFTA](2)-x(6)-G-K-[KR]-x(5)-[LIVMn-[LIVMFC] 
[STAC] 

[ 1] Klussmann S.. Franke P., Bergmann U., Kostka S., Wittmann-Liebold B. Biol. Chem. Hoppe-Seyler 374: 
305-312(1993). 

15 [ 2] Otaka E.. Hashimoto T.. Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 

[ 3] Ignatovich O.. Cooper M., Kulesza H.M.. Beggs J.D. Nucleic Acids Res. 23:4616-4619(1995). 

[1417] 564. Ribosomal protein S7e signature 

[1418] A number of eukaryotic ribosomal proteins can be grouped on the basis of sequence similarities [1]. One of 
20 these families consists of: 

Mammalian S7. 
Xenopus S8. 
Insect S7. 

25 . Yeast probable ribosomal protein S7 (N2212). 

Fission yeast probable ribosomal protein S7 (SpAC18G6.13c). 

These proteins have about 200 amino acids. A highly conserved stretch of 14 residues which is located in the central 
section and which is rich in charged residues was selected as a signature pattern. 
30 [1419] Consensus pattern: [KRl-L-x-R-E-L-E-K-K-F-[SAP)-x-(KR]-H 

[1420] [1] Salazar C.E., Mills-Hamm D.M., Kumar V.. Collins F.H. Nucleic Acids Res. 21:4147-4147(1993). 
[1421] 565. Ribosomal protein S8 signature 

Ribosomal protein S8 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S8 is known to bind 
directly to 16S ribosomal RNA. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities 
35 [1], groups: - Eubacterial S8. - Algal and plant chloroplast S8. 

Cyanelle S8. - Archaebacterial S8. - Marchantia polymorpha mitochondrial S8. 

- Mammalian S15A. - Plant S15A. • Yeast S22 (S24). 

<o The best conserved region located in the C-terminal section of these proteins has been selected as a signature pattern. 

- Consensus pattern: [GE]-x(2)-[LIV](2)-ISTY]-[ST]-x(2)-G-[LIVM](2)-x(4)-[AG]-[KRHAYI] 

[ 1] Otaka E., Hashimoto T., Mizuta K. 
45 Protein Seq. Data Anal. 5:285-300(1993). 

[1422] 566. Ribosomal protein S8e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities 
[1]. One of these families consists of: 

50 . Mammalian S8. - Caenorhabditis elegans S8 (F42C5.8). - Leishmania major S8. 

- Plant S8. - Yeast S8 (S14) (Rp19). - Archebacterial S8e. 

These proteins have either about 220 amino acids (in eukaryotes) or about 125 amino acids (in archebacteria). A 
conserved stretch which is located in the N-terminal section and which is rich in positively charged residues has been 
55 selected as a signature pattem. 

. Consensus pattem: [KR]-x(2)-[ST|.G-[GA]-x(5)-[HR]-[KGl-IKR]-x-K-x-E-[LM]-G 
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[ 1] Engemann S., Herfurth E., Briesemeister U., Wittmann-Liebold B. 

J. Prctein C-hem. 14:i8P-ififi{l995). 

[1423] 567. Ribosomal protein S9 signature . x ^ i .r.*^ir.^ 

Ribosomal protein S9 is one of the proteins from the small ribosomal subunit. It belongs to a family of ribosomal proteins 
which, on the basis of sequence similarities [1,2], groups: - Eubacterial S9. - Algal chloroplast S9. 

- Cyanelle S9. - Archaebacterial S9. - Mammalian S16. - Plant S16. 
Yeast mitochondrial ribosomal S9. 

A conserved region containing many charged residues and located in the central section of these proteins has been 
selected as a signature pattern. 

- consensus pattem:G-G-G-x(2).[GSA]-Q-x(2)-[SA]-x(3)4GSA]-x-[GSTAV].[KRHGSAL]-^ 

[ 1] Chan Y.-L., Paz V.. Olvera J.. Wool I.G. FEBS Lett. 263:85-88(1990). 
[ 2] Otaka E., Hashimoto T., Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[1424] 568. Ribulose-phosphate 3-epimerase family signatures 

Ribulose-phosphate 3-epimerase (EC 5.1.3.1) (also known as pentose-5.phosphate 3-ep.merase or PP^) is the en- 
zyme that converts D-ribulose 5-phosphate into D-xylulose 5-phosphate in Calvin's reductive pentose Phosphate cyci^^^ 
In Alcaligenes eutrophus two copies of the gene coding for PPE are known [1]. one is chromosomally encoded (cbbEC). 
the other one is on a plasmid (cbbeP). PPE has been found in a wide range of bacteria, archebactena. fungi and plants. 
The sequence of PPE is highly related to: 

- Escherichia coli D-allulose-6-phosphate 3-epimerase (gene alsE). 
Escherichia coli protein sgcE. 

Mycoplasma genitalium hypothetical protein MG112. 

All these proteins have from 209 to 241 amino acid residues. , . ^ , r.r^*™« haw« 

Two conserved regions which are located respectively in the N-temiinal and in the central part of these proteins have 
been selected as signature patterns. 

- Consensus pattem: [LIVMF]-H-[LIVMFY]-D-[LIVM]-x-D-x(1 .2)-[FY]-[LIVM]-x-N-x-[STAV] 

- Consensus pattem: [LIVMA]-x-[LIVM]-M-[ST|-[VS]-x-P-x(3)-G-Q-x-F-x(6HNK]-[LIVMC] 

[ 1] Kusian B., Yoo J.G., Bednarski R.. Bowien B. 
J. Bacteriol. 174:7337-7344(1992), 

[1425] 569. (Ricin B lectin) Similarity to lectin domain of ricin beta-chain, 3 copies. 

[1426] This family consists of a triplicated domain involved in cell agglutination in ricin. 

[1427] 570. (Rotamase) PpiC-type peptidyl-prolyl cis-trans isomerase signature , - , 

Peptidyl-prolyl cis-trans isomerase (EC 5.2.1.8) (PPIase or rotamase) is an enzyme that accelerates protein folding 

by catalyzing the cis-trans isomerization of proline imidic peptide bonds in oligopeptides [1]. Most ^^arac enzed PP.^ 

b^ngt^tw? families. thecycIophilin-type(see<^^^ 

a third family has been discovered [2.3]. So far. the only biochemically characterized member of this family is the 
Escherichia coli protein parvulin (gene ppiC). a small (92 residues) cytoplasmic enzyme that prefers res^ 
dues with hydrophobic side chains like leucine and phenylalanine in the PI position of the peptides substrates. PpiC 
is evolutionary related to a number of proteins that are also probably PPiases: 

- Escherichia coli and Haemophilus influenzae ppiD. PpiD is a PPIase which contains a periplasmic ppiC-like domain 
anchored to the inner membrane and which seems to be involved in the folding of outer membrane proteins. 

- Escherichia coli surA. SurA is a periplasmic protein that contains two ppiC-like domains. 

- Nitrogen-assimilating bacteria protein nIfM which is Involved in the activation and stabilization of the iron-compo- 
nent (nifH) of nitrogenase. 

- Bacillus subtilis protein prsA. a membrane-bound lipoprotein involved In protein export. , ^ . *k 

. Lactococcus and lactobacillus protease maturation protein prtM. a membrane4)Ound lipoprotein involved in the 
maturation of a secreted serine proteinase. - Yeast protein ESS1/PTF1 (processing/termination factor 1). 

- Drosophila protein dodo (gene dod). - Mammalian protein PIN1 . 
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Campylobacter jejuni cell binding factor 2 (CBF2), a secreted antigen. 
Bacillus subtilis hypothetical protein yacD. 
Helicobacter pylori hypothetical protein HP0175. 
A hypothetical slime mold protein. 

5 

A conserved region that contains a serine which could play a role in the catalytic mechanism of these enzymes has 
been selected as a signature pattern. 

- Consensus pattem: F-[GSADEI]-x-[LVAQ]-A-x(3HSTl-x(3.4)-[STQ]-x(3.5)-[GERl-G-x-[LlVM]-[GS] 

10 

[ 1] Fischer G., Schmid F.X. 
Biochemistry 29:2205-2212(1990). 

[ 2] Rudd K.E.. Sofia H.J.. Koonin E.V.. Plunkett G. Ill, Lazar S.. 
Rouviere P.E. Trends Biochem. Sci. 20:14-15(1995). 
*5 [ 3] Rahfeld J.-U., Ruecknagel K.R, Schelbert B., Ludwig B.. Hacker J., 

Mann K., Fischer G. FEBS Lett. 352:180-184(1994). 

[1428] 571 . (RmaAD) Ribosomal RNA adenine dimethylases signature 

A number of enzymes responsible for the dimethylation of adenosines if ribosomal RNAs (EC 2.1.1.48) have been 
20 found [1 ,2] to be evolutionary related. These enzymes are: 

Bacterial 16S rRNA dimethylase (gene ksgA), which acts in the biogenesis of ribosomes by catalyzing the dimeth- 
ylation of two adjacent adenosines in the loop of a conserved hairpin near the 3'-end of 16S rRNA. Inactivation of 
ksgA leads to resistance to the aminoglycoside antibiotic kasugamycin. 
25 - Yeast 18S rRNA dimethylase (gene DIM1). which is functionally similar to ksgA and that dimethylates twin ade- 
nosines in the 3'-end of 18S rRNA. 

Bacterial *erm' methylases. These enzymes confer resistance to macroltde-lincosamide-streptogramin B (MLS) 
antibiotics - such as erythromycin - by dimethylating the adenine residue at position 2058 of 23S rRNA thus resulting 
In a reduced affinity between ribosomes and the MLS antibiotics. 
30 - Caenorhabditis elegans hypothetical protein E02H1.1. 

The best conserved regions in these enzymes is located in the N-terminal section and corresponds to a region that is 
probably involved in S-adenosyl methionine (SAM) binding. 

35 - Consensus pattem: [LIVM]-[LIVMFY]-[DE]-x-G-[STAPV]-G-x-[GA]-x-[LIVMF]-[STJ-x(2)-[LIVMJ-x(6)-[LIVMYl-x- 
[STAGV]-[LIVMFYHC]-E-x-D 

[ 1] van Gemen B., van Knippenberg RH. (In) Nucleic acid methylation. Clawson G.A., Willis D.B.. Weissbach A.. 
Jones P.A., Eds., pp.19-36, Alan R, Liss Inc. New-York. (1990). 
^ [ 2] Lafontaine D., Delcour J., Glasser A.L., Desgres J., Vandenhaute J. J. Mol. Biol. 241:492*497(1994). 

[1429] 572. (RuBisCO small) Ribulose bisphosphate carboxylase, small chain. 206 members 
[1 430] 573, ATP/GTP-binding site motif A (P-loop) (ras) 

From sequence comparisons and crystallographic data analysis it has been shown [1,2,3,4,5,6] that an appreciable 
^5 proportion of proteins that bind ATP or GTP share a number of more or less conserved sequence motifs. The best 
conserved of these motifs is a glycine-rich region, which typically forms a flexible loop between a beta-strand and an 
alpha-helix. This loop interacts with one of the phosphate groups of the nucleotide. This sequence motif is generally 
referred to as the 'A' consensus sequence [1] or the 'P-loop* [5]. There are numerous ATP- or GTP-binding proteins in 
which the P-loop is found. A number of protein families for which the relevance of the presence of such a motif has 
50 been noted are listed below: - ATP synthase alpha and beta subunits. - Myosin heavy chains. - Kinesin heavy chains 
and kinesin-like proteins. - Dynamins and dynamin-like proteins - Guanylate kinase - Thymidine kinase (- Thymidylate 
kinase. - Shikimate kinase. - Nitrogenase iron protein family (nifH/frxC) - ATP-binding proteins involved in 'active trans- 
port" (ABC transporters) [7] - DNA and RNA helicases [8,9,10]. - GTP-binding elongation factors (EF-Tu, EF-1 alpha, 
EF-G, EF-2, etc.). - Ras family of GTP-binding proteins (Ras, Rho, Rab, Ral, Yptl. SEC4, etc.). - Nuclear protein ran. 
55 - ADP-ribosylation factors family - Bacterial dnaA protein - Bacterial recA protein - Bacterial recF protein - Guanine 
nucleotide-binding proteins alpha subunits (Gi, Gs, Gt, GO. etc.). - DNA mismatch repair proteins mutS family - Bacterial 
type II secretion system protein E. Not all ATP- or GTP-binding proteins are picked-up by this motif. A number of 
proteins escape detection because the structure of their ATP-binding site is completely different from that of the P- 
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loop Examples of such proteins are the E1-E2 ATPasesorthe glycolytic kinases. In other ATP- or GTP-binding proteins 
m» fioviwo innn Bvists in » sliahtiv different form: this is the case for tubulins or protein kinases. A special mention must 
be"r^e^^ed Vradenylate kinasei in which there is a single deviation from the P-loop pattern: in the last position Uiy is 
found instead of Ser or Thr. 

Consensus pattern: [AG]-x(4)-G-K-[ST| .h^^^ r.r„»«ne 

In addition to the proteins listed above, the 'A' motif is also found in a number of other proteins. Most of these proteins 
probably bind a nucleotide, but others are definitively not ATP- or GTP-binding (as for example chymotrypsin. or human 

riTwarkeTj^!sarasteM.. Runswick M.J.. Gay N.J. EMBO J. 1:945-951(1982).[2] Roller W Amons R. FEBS Lett. 
86-1-7(1985). 3] Fry D.C.. Kuby S.A.. Mildvan A.S. Proc. Natl. Acad. Sci. U.S.A. 83:907-911(1986).[4] Dever TE., 
Glynlas M.J.. MerrickW.C. Proc. Natl. Acad. Sci. U.S.A. 84:1814-1818(1987).[ 5] Saraste M.. Sibbald PR., W.ttinghofer 
A Trends Biochem. Sci. 15:430-V34(1990).[6] Koonin E.V. J. Mol. Biol. 229:1165-1174(1993).! 7] Higg.ns CP.. Hyde 
S C Mimmack M.M., GlleadI U.. Gill D.R.. Gallagher M.R J, Bioenerg. Biomembr. 22:571-592(1990).[81 Hodgman T. 
C Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata).[ 9] Under P.. Lasko P.. Ashbumer M.. Leroy P.. 
Nielsen RJ.. Nishi K.. Schnier J., Slonlmski RR Nature 337:121-122(1989).[10] Gorbalenya A.E.. Koonin E.V.. 
Donchenko A.R, Blinov V.M. Nucleic Acids Res. 17:4713-4730(1989). 

[1431] GTP-binding nuclear protein ran signature (ras) , , ^ =„ „ 

Ran (orTC4) is a small abundant nuclear protein that binds and hydrolyzes GTP and which has been implicated in a 
large number of processes including nucleocytoplasmic transport. RNAsynthesis. processing and export and cell cycle 
checkpoint control [1.2]. Ran is generally included in the RAS •superfamiiy of small GTP-binding proteins pj. but .t is 
only slightly related to the other RAS proteins. It also differs from RAS proteins in that it lacks cysteine residues at its 
C- temninal and is therefore not subject to prenylation. Instead ran has an acidic C-terminus. It is. however similar to 
RAS family members in requiring a specific guanine nucleotide exchange factor (GEF) and a specific GTPase activating 
protein (GAP) as stimulators of overall GTPase activity. The region of the GTP-binding B motif wh^h in ran is perfecUy 
conserved has been selected as a signature pattern. Consensus pattern: D-T-A-G-Q-E-K-[LF]-G-G-L-R-IDE1-C5-y-y- 
Proteins belonging to this family also contain a copy of the ATP/GTP- binding motif 'A" (P-loop). 
MrlS.effzek K.. Klebe C. Fritz-Wolf K.. Kabsch W., Wittinghofer A. Nature 374:378-381(1995).[ 2] Rush M.G Drwas 
G.. d'Eustachio R BioEssays 18: 103-11 2(1 996).[ 3] Valencia A.. Chardin R. Wittinghofer A.. Sander C. Biochemistry 
30:4637-4648(1991). 

[14321 574. recA signature . • . r->KiA 

The bacterial recA protein [1.2,3,Eri is essential for homologous recombination and recombinational repair of DNA 
damage. RecA has many activities: it filaments, it binds to single- and double-stranded DNA. itbinds and hydrolyzes 
ATP it is also a recombinase and, finally, it interacts with lexA causing its activation and leading to its autocatalytic 
cleavage. RecA is a protein of about 350 amino-acid residues. Its sequence is very well conserved [3.4,5 among 
eubacterial species. It is also found in the chloroplast of plants [6]. The best conserved region, a nonapeptide located 
in the middle of the sequence which is part of the monomer-monomer interface in a recA filament has been selected 
as a signature pattern,. 

Consensus pattern: A-L-[KR]-[1F]-[FY]-[STA]-[STAD]-[LIVMQ]-R- i ^t.^qq^tmqq-^^ 

r 1] Smith K C , Wang T-C. V. BioEssays 10:12-16(1989).[ 2] Lloyd A.T. Sharp RM. J. Mol. Evol. 37.399-«)7(1993) 
3 Roca A I Cox M.M. Prog. Nucleic Acids Res. Mol. Biol. 56 : 129-223(1 997). [ 4] Karlin S., Weinstock G.M.. Brendel 
V J Bacteriol. 177:6881-6893(1995).[ 5] Eisen J.A. J. Mol. Evol. 41:1105-1123(1995).[ 6] Cerutli H.D.. Osman M. 
Grandoni R. Jagendorf A.T. Proc. Natl. Acad. Sci. U.S.A. 89:8068-8072(1992).[E11 hftp://www.tiqr.orq/-|eisenff^ecA/ 
RecA.html 

ri433] 575. Response regulator receiver domain 

This domain receives the signal from the sensor partner InComment: bacterial twoH^omponent systems. It is usually 

found N-terminalComment: to a DNA binding effector domain. 

[1] Pao GM, Saier MH; J Mol Evol 1995;40:136-154. 

[14341 576. Ribonucleotide reductase large subunit signature 

•Ribonucleotide reductase (EC 1.17.4.1) [1.2] catalyzes the reductive synthesis of deoxyribonucleotides from their 
corresponding ribonucleotides. It provides the precursors necessary for DNA synthesis. 

an oligomeric enzyme composed of a large subunit (700 to 1000 residues) and a small subunit (300 to 400 res^ues)^ 
There are regions of similarities in the sequence of the large chain from prokaryotes, eukaryotes and viruses. One of 
these regions has been selected as a signature pattern. 

[1435] Consensus pattern: W-x(2).[Ln-x(6J)-G-[LIVMHFYRA]-[NH]-x(3)4^^^ 

1 1] Nillson O., Lundqvist T. Hahne S.. Sjoberg B.-M. Biochem. Soc. Trans. 16:91-94(1 988). [ 2] Re.chard P. Science 
260:1773-1777(1993). 

[1436] 577. Ribonuclease T2 family histidine active sites ^ „^ ^ ou- 

The fungal ribonucleases T2 from Aspergillus oryzae. M from Aspergillus saitoiand Rh from Rhizopeus nh^eus are 
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structurally and functionally related 30 Kdglycoproteins [1] that cleave the 3'-5* intemucleotide linkage of RNA via a 
nucleotide 2',3'-cyclic phosphate intermediates (EC 3.1.27.1) .A number of other RNAses have been found to be evo- 
lutionary related to these fungal enzymes: - Self-incompatibility [2] in flowering plants is often controlled by a single 
gene (S-gene) that has several alleles. This gene prevents fertilization by self-pollen or by pollen bearing either of the 

5 two S- alleles expressed in the style. The self-incompatibility glycoprotein from several higher plants of the solanaceae 
family has been shown [2,3] to be a ribonuclease. - Phosphate-starvation induced RNAses LE and LX from tomato 
[4]. These two enzymes are probably involved in a phosphate-starvation rescue system. - Escherichia coli periplasmic 
RNAse I (EC 3.1.27.6 ) (gene ma) [5]. - Aeromonas hydrophila periplasmic RNAse. - Haemophilus influenzae hypo- 
thetical protein HI0526.Two histidines residues have been shown [6.7] to be involved in the catalytic mechanism of 

10 RNase T2 and Rh. These residues and the region around them arehighly conserved in all the sequence described 
at>ove. Two signature patterns have been developed, one for each of the two active-site histidines. The second pattern 
also contains a cysteine which is known to be involved in a disulfide bond. 
Consensus pattem: [FYWL]-x-[LIVM]-H-G-L-W-P [H is an active site residue] 

Consensus pattem: [LIVMF]-x(2)-[HDGTY]-[EQ]-[FYW]-x-[KR]-H-G-x-C [H is an active site residue] [C is involved in 
IS a disulfide bond] 

[ 1] Watanabe H., Naitoh A.. Suyama Y., Inokuchi N., Shimada H., Koyama T, Ohgi K., Irie M. J. Biochem, 108:303-310 
(1990).[2] Haring V., Gray J.E., McClure B.A., Anderson M.A., Clari<e A.E. Science 250:937-941 (1990).[ 3] McClure 
B.A.. Haring V., Ebert RR.. Anderson M.A., Simpson R.J., Sakiyama R, Clarke A.E. Nature 342:95957(1 989).( 4] Lo- 
effler A.. Glund K.. Irie M. Eur. J. Biochem. 214:627-633(1 993). [ 5] Meador J. III. Kennell D. Gene 95: 1-7(1 990). [ 6] 
20 Kawata Y., Sakiyama R. Hayashi R. Kyogoku Y. Eur. J. Biochem. 187:255-262(1 990). [ 7] Kurihara H., Mitsui Y.. Ohgi 
K.. Irie M., Mizuno H., Nakamura K.T FEBS Lett. 306:189-192(1992). 

[1437] 578. Ribonucleotide reductase large subunit signature. Ribonucleotide reductase (EC 1.17.4.1 ) [1 ,2] catalyz- 
es the reductive synthesis of deoxyribonucleotides from their corresponding ribonucleotides. It provides the precursors 
necessary for DNA synthesis. Ribonucleotide reductase is an oligomeric enzyme composed of a large subunit (700 to 

25 1 000 residues) and a small subunit (300 to 400 residues). There are regions of similarities in the sequence of the large 
chain from prokaryotes, eukaryotes and viruses. One of these regions has been developed as a signature pattem. 
[1438] Consensus pattern: W.x(2)-[LF]-x(6,7)-G-[LIVM]-[FYRA]-(NH]-x(3)-[STAQLIVM]-[ASC]-x(2)-(PA]- 
[1439] [ 1] Nillson O., Lundqvist T., Hahne S., Sjoberg B.-M. Biochem. Soc. Trans. 1 6:91-94(1 988).[ 2] Reichard P. 
Science 260:1773-1777(1993). 

30 [1440] 579. RNase H 

RNase H digests the RNA strand of an RNA/DNA hybrid. Important enzyme In retroviral replication cycle, and often 
found as a domain associated with reverse transcriptases. Structure is a mixed alpha+beta fold with three a/b/a layers. 
[1441] 58C. Eukaryotic putative RNA-binding region RNP-1 signature (rrm) 

Many eukaryotic proteins that are known or supposed to bind single-strandedRNA contain one or more copies of a 
35 putative RNA-binding domain of about 90amino acids [1 ,2]. This region has been found in the following proteins: ** 
Heterogeneous nuclear ribonucleoproteins ** - hnRNP A1 (helix destabilizing protein) (twice). - hnRNP A2/B1 (twice). 

- hnRNP C (C1/C2) (once). - hnRNP E (UP2) (at least once). - hnRNP G (once). ** Small nuclear ribonucleoproteins 
*• - U1 snRNP 70 Kd (once). - U1 snRNP A (once). - U2 snRNP B" (once). ** Pre-RNA and mRNA associated proteins 
** - Protein synthesis initiation factor 4B (elF-4B) [3], a protein essential for the binding of mRNA to ribosomes (once). 

40 . Nucleolin (4 times). - Yeast single-stranded nucleic acid-binding protein (gene SSB1) (once). - Yeast protein NSR1 
(twice). NSR1 is involved in pre-rRNA processing; it specifically binds nuclear localization sequences. - Poly(A) binding 
protein (PABP) (4 times). ** Others - Drosophila sex determination protein Sex-lethal (Sxl) (twice). - Drosophila sex 
determination protein Transformer-2 (Tra-2) (once). - Drosophila *e!av' protein (3 times), which is probably involved in 
the RNA metabolism of neurons. - Human paraneoplastic encephalomyelitis antigen HuD (3 times) [4], which is highly 

45 similar to elav and which may play a role in neuron-specific RNA processing. - Drosophila *bicoid* protein (once) [5], a 
segment-polarity homeobox protein that may also bind to specific mRNAs. - La antigen (once), a protein which may 
play a role in the transcription of RNA polymerase III. - The 60 Kd Ro protein (once), a putative RNP complex protein. 

- A maize protein induced by abscisic acid in response to water stress, which seems to be a RNA-binding protein. - 
Three tobacco proteins, located in the chloroplast [6], which may be involved in splicing and/or processing of chloroplast 

so RNAs (twice). - X16 [7], a mammalian protein which may be involved in RNA processing in relation with cellular pro- 
liferation and/or maturation. - Insulin-induced growth response protein CI-4 from rat (twice). - Nucleolysrns TiA-1 and 
TIAR (3 times) [8] which possesses nucleolytic activity against cytotoxic lymphocyte target cells, may be involved in 
apoptosis. - Yeast RNA1 5 protein, which plays a role in mRNA stability and/or poly-(A) tail length [9]. Inside the putative 
RNA-binding domain there are two regions which are highly conserved. The first one is a hydrophobic segment of six 

55 residues (which is called the RNP-2 motif), the second one is an octapeptide motif (which is called RNP-1 or RNP- 
CS). The position of both motifs in the domain is shown in the following schematic representation: 
[1442] xxxxxxx######xxxxxxxxxxxxxxxxxxxxxxxxxxxxx###^^ x RNP-2 RNP-1 

The RNP-1 motif has been used as a signature pattem for this type of domain. 
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Consensus pattern: tRiq-G-{EDRKHPCG}-[AGSCIHFYHLIVA]-x-[FYLMl In most cases the residue in position 3 of 

♦Ko, nattom iQ ckithAr Tvr nr Phe. 

[ il Bandziulis R.J., Swanson M.S., Dreyfuss G. Genes Dev. 3:431-437(1989).( 2] Dreyfuss G.. Swanson M.S.. Plnoi- 
Roma S Trends Biochem. Sci. 13:86-91{1988).[ 3] Milbum S.C.. Hershey J.W.B.. Davies M.V.. Kelleher !<.. Kaufman 
R J EMBO J 9 2783-2790(1 990).[ 4] Szabo A.. Dalmau J.. Manley G.. Rosenfeld M.. Wbng E.. Henson J.. Posner J. 
b' Furneaux H.M. Cell 67:325-333(1 991).[ 5] Rebagliati M. Cell 58:231-232(1 989) .[61 Li Y.. Sugiura M- EMBO J. 9: 
3059-3066(1 990).[ 7] Ayane M., Preuss U.. Koehler G., Nielsen P.J. Nucleic Acids Res. ''9=''273-1278(1991).l8] 
Kawakami A Tian Q.. Duan X.. Streuli M., Schlossman S.F., Anderson R Proc. Natl. Acad. Sci. U.S.A. 89:8681-8685 
(1992).[9] Minvielle-Sebastia L.. WinsorB.. Bonneaud N.. Lacroute F. Mol. Cell. Biol. 11:3075-3087(1991). 
[14431 581. Rubredoxin signature , 
[1444] Rubredoxins [1] are small electron-transfer prokaryotic proteins. They contain an iron atom which is ligated 
by four cysteine residues. Rubredoxins are. in some cases, functionally interchangeable with ferredoxms. 
[1445] A conserved region that includes two of the cysteine residues that bind the iron atom has been selected as 
a pattem for these proteins. 

[1446] Consensus pattern: [LIVMl-x(3)-W-x-C-P-x-C-[AGD] [The two C's bind the iron atom] 

In Pseudomonas oleovorans rubredoxin 2 (gene alkG) [2], this pattern is found twice because alkG has two rubredoxin 

RubSJ^rin [3], a protein with inorganic pyrophosphatase activity from Desulfovibrio vulgaris possesses a C-temiinal 
mbrBdoxin-like domain, but this domain is too divergent to be detected by the above pattem. , „ ^ 

[14471 [ 11 Berg J.M.. Holm R.H.(ln) Iron-sulfur proteins. Spiro TG.. Ed.. pp1-66. Wiley, New-York. (1982). [ 2] Kok 
M Oldenhuis R.. der Linden M.P.G.. Meulenberg C.H.C., Kingma J., Witholt B.. J. Biol. Chem. 264:5442-5451(1989). 
[3i'van Beeumen J.J.. van Driessche G.. Liu M.-Y. Le Gall J.. J. Bid. Chem. 266:20645-20653(1991). 
[1448] 582 (rvp) Eukaryotic and viral aspartyl proteases active site 

Aspartyl proteases, also known as acid proteases. (EC 3.4.23.-) are a widely distributed family of proteolytic enzymes 
[1 2 3] known to exist invertebrates, fungi, plants, retroviruses and some plant viruses. Aspartate proteases of eukaiy- 
otes are monomeric enzymes which consist of two domains. Each domain contains an active site centered on a catalytic 
aspartyl residue. The two domains most probably evolved from the duplication of an ancestral gene encoding a pri- 
mordial domain. Currently known eukaryotic aspartyl proteases are: - Vertebrate gastric pepsins A and C (also knowri 
as qastricsin). - Vertebrate chymosin (rennin). involved in digestion and used for making cheese. - Vertebrate lysosomal 
cathepsins D (EC 3.4.23.5 ) and E (EC 3.4.23.34 ). - Mammalian renin (EC 3.4.23.15) whose function is to generate 
angiotensin I from angiotensinogen in the plasma. - Fungal proteases such as aspergillopepsin A (EC 3.4.23 18 ), 
candidapepsin (EC 3.4.23.24) . mucoropepsin (EC 3.4.23.23 ) (mucor rennin), endothiapepsin (EC 3.4.23.22 ), polypo- 
ropepsin (EC 3.4.23.29) . and rhizopuspepsin (EC 3.4.23.21 ). - Yeast saccharopepsin (EC 3.4.23.25 ) (proteinase A) 
(gene PEP4) PEP4 is implicated in posttranslational regulation of vacuolar hydrolases. - Yeast ban-ier pepsin (EC 
3 4 23 35) (gene BAR1); a protease that cleaves alpha-factor and thus acts as an antagonist of the mating pheromone. 
-Fission yeast sxal which is involved in degrading or processing the mating pheromones. Most retroviruses and some 
plant viruses, such as badnaviruses. encode for anaspartyl protease which Is an homodimer of a chain of about 95 to 
125 amino acids. In most retroviruses, the protease is encoded as a segment of a polyprotein which is cleaved during 
the maturation process of the virus. It is generally part of the pol polyprotein and. more rarely, of the gagpolyprotem. 
Conservation of the sequence around the two aspartates of eukaryotic aspartyl proteases and around the single active 
site of the viral proteases allows us to develop a single signature pattern for both groups of protease. 
Consensus pattem: [LIVMFGAC]-[LIVMTADN]-ILIVFSA]-D-[ST1-G-[STAV]-[STAPDENQ]- x-[LIVMFSTNC]-x-[LIVMF- 
GTA] [D is the active srte residue] - [ 1] Foltmann B. Essays Biochem. 17:52-84(1981).[2] 

Biophys. Chem. 19:189-215(1990).[ 3] RaoJ.K.M..EricksonJ.W..WIodawer A. Biochemistry 30:4663-4671(1991).[ 4] 

Rawlings N.D.. Barrett A.J. Meth. Enzymol. 248:105-120(1995). 

[1449] 583 (rvt) Reverse transcriptase (RNA-dependent DNA polymerase) 

A reverse transcriptase gene is usually indicative of a mobile element such as a retrotransposon or retrovirus. Reverse 
transcriptases occur in a variety of mobile elements, including retrotransposons, retroviruses, group M introns. bactenal 
msDNAs hepadnaviruses, and caulimoviruses. Number of members: 1233 

[1450] [1] Medline: 91 006031 . Origin and evolution of retroelements based upon their reverse transcnptase sequenc- 
es. Xiong Y. Eickbush TH; EMBO J 1990;9:3353-3362. 
[14511 584. (S-AdoMet synt) S-adenosylmethionine synthetase signatures 

S-adenosylmethionine synthetase (EC 2.5.1.6) is the enzyme that catalyzes theformation of S-adenosylmethionine 
(AdoMet) from methionine and ATP [1]. AdoMet Is an important methyl donor for transmethylation and is also the 
propylamino donor in polyamine biosynthesis. In bacteria there is a single isofbrm of AdoMet synthetase (gene metK). 
there are two in budding yeast (genes SAM1 and SAM2) and in mammals while in plants there is generally a multigene 
family The sequence of AdoMet synthetase is highly conserved throughout isozymes and species. Two signature pat- 
tems have been selected for this type of enzyme; the first Is a hexapeptlde which seems to be involved in ATP-binding; 
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the second is an almost perfectly conserved glycine-rich nonapeptide. 

Consensus pattern: G-A-G-D-Q-G-x(3)-G-[FYH]-Sequences known to belong to this class detected by the pattern: 
Consensus pattern: G-[GA]-G-[ASCJ-F-S-x-K-IDEl 

[ 1] Horikawa S.. Sasuga J.. Shimlzu K., Ozasa H.. Tsukada K. J. Biol. Chem. 265:13683-13686(1990). 

5 [1452] 585. S1 RNA binding domain 

The S1 domain occurs in a wide range of RNAComment: associated proteins. It is structurally similarComment: to cold 
shock protein which binds nucleic acids.Comment: The SI domain has an OB-fold structure. 
[1] Bycroft M. Hubbard TJ, Proctor M. Freund SM. Murzin AG; Cell 1997;88:235-242. 
[1453] 586. SAICAR synthetase signatures 

10 Phosphoribosylaminotmidazole-succinocarboxamide synthase (EC 6.3.2.6) (SAICARsynthetase) catalyzes the sev- 
enth step in the de novo purine biosynthetic pathway; the ATP-dependent conversion of 5*-phosphoribosyl-5-aminoim- 
idazole-4-carboxylic acid and aspartic acid to SAICAR [1]. In bacteria (gene purC).fungi (gene ADE1) and plants. 
SAICAR synthetase is a monofunctlonal protein;in higher vertebrates it Is the N-terminal domain of a bifunctional en- 
zyme that also catalyze phosphoribosylaminoimidazole carboxylase (AIRC) activity. Two conserved regions in the 

15 central section of this enzyme have been selected as signature patterns for SAICAR synthetase. 
Consensus pattern: [LIVMF](2)-P-[LIVM]-E-x-[LIVM]-[LIVMCAJ-R-x(3)-[TA]-G-S- 
Consensus pattern: [LIVMHLIVMA]-D-x-K-[LIVMFYl-E-F-G 
[ 1] ZaIkin H.. Dixon J.E. Prog. Nucleic Acid Res. Mol. Biol. 42:259-287(1992). 
[1454] 587. (SCP) Extracellular proteins SCP/Tpx-1/Ag5/PR-1/Sc7 signatures 

20 A variety of extracellular proteins from eukaryotes have been found to be evolutionary related: - Rodent sperm-coating 
glycoprotein (SCP), also known as acidic epididymal glycoprotein (AEG) . This protein is thought to be involved in 
sperm maturation [1]. It is a protein of about 220 residues and probably contains eight disulfide bonds. - Mammalian 
testis-specific protein Tpx-1 [2]. Tpx-1 is highly related to SCP's. - Mammalian glioma pathogenesis-related protein 
(GliPR). - Lizard helothermine, a toxin that blocks ryanodine receptors. - Venom allergen 5 (Ag5) from vespid wasps 

25 and venom allergen 3 (Ag3) from fire ants. These proteins are potent allergens and are the main cause of allergic 
reactions to stings from insects of the hymenoptera family [3]. Ag5/3 are proteins of about 200 residues and contain 
four disulfide bonds. - Plant pathogenesis proteins of the PR-1 family [4]. These proteins are synthesized during path- 
ogen infection or other stress-related responses. They are proteins of about 1 30 to 140 residues and probably contain 
three disulfide bonds. - Proteins Sc7 and Sc14 from the basidomycete fungus Schizophyllum commune. These extra- 

30 cellular proteins are loosely associated with fruit body hyphal walls [5]. Sc7/14 are proteins of about 180 residues and 
probably contain two disulfide bonds. - Ancylostoma secreted protein from dog hookworm. - Yeast hypothetical proteins 
YJL078C, YJL079cand YKR013w.The exact function of these proteins is not yet known. Two conserved regions located 
in their C-terminal half have been selected as signature patterns. The second signature contains a cysteine which is 
known to be involved in a disulfide bond in Ag5. 

35 Consensus pattern: [GDER]-H-[FYWH]-T-Q-[LIVM](2)-W-x(2)-[STN] 

Consensus pattern: [LlVMFYH]-[LIVMFY]-x-C-INQRHSl-Y-x-[PARHl-x-IGL]-N-ILIVMFYWDN] [C is involved in a di- 
sulfide bond] 

[ 1] Mizuki N.. Kasahara M. Mol. Cell. Endocrinol. 89:25-32(1 992). [ 2] Kasahara M.. Gutknecht J., Brew K., Spun- N., 
Goodfellow RN. Genomics 5:527-534(1 989). [ 3] Lu G., Villalba M., Coscia M.R.. Hoffman D.R., King TP. J. Immunol. 
40 1 50:2823-2830(1 993).[ 4] Dixon D.C., Cutt J.R., Klessig D.F. EMBO J. 10:1317-1324(1991).[ 5] Schuren F.H.J.. As- 
geirsdottir S.A.. Kothe E.M.. Scheer J.M.J., Wessels J.G.H. J. Gen. Microbiol. 139:2083-2090(1993). 
[1455] 588. SET domain 

SET domains appear to be protein-protein interaction Comment: domains. It has been demonstrated that SET do- 
mainsComment: mediate interactions with a family of proteins thatComment: display similarity with dual-specificity 
45 phosphatasesComment: (dsPTPases) [2]. 

[1] Tripoulas N, LaJeunesse D, Gildea J, Shearn A; Genetics 1996;143:913-928. [2] Cui X. De Vivo i, Slany R. Miyamoto 
A, Firestein R. Cleary. ML; Nat Genet 1998;18:331-337. 
[1456] 589. Src homology 3 (SH3) domain profile 

The Src homology 3 (SH3) domain is a small protein domain of about 60 amino-acid residues first identified as a 
50 conserved sequence in the non-catalytic part of several cytoplasmic protein tyrosine kinases (e.g. Src. Abl. Lck) [1]. 
Since then, it has been found in a great variety of other intracellular or membrane-associated proteins [2.3,4, 5].The 
SH3 domain has a characteristic fold which consists of five or six beta-strands arranged as two tightly packed anti- 
parallel beta sheets. The linker regions may contain short helices [6].The function of the SH3 domain is not well un- 
derstood. The current opinion is that they mediate assembly of specific protein complexes via binding to proline-rich 
55 peptides [7].ln general SH3 domains are found as single copies in a given protein, but there is a significant number of 
protein with two SH3 domains and a few with 3 or 4 copies. So far, SH3 domains have been identified in the following 
proteins: - Many vertebrate, invertebrate and retroviral cytoplasmic (non-receptor) protein tyrosine kinases. In particular 
in the Src, Abl. Bkt. Csk and ZAP70 families of kinases. - Mammalian phosphatidylinositol-speclfic phospholipase C- 
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gamma-1 and -2. - Mammalian phosphatidyl inositol 3-kinase regulatory p85 subunit. - Mammalian Ras GTPase- 
^rtiv=,tmn nrotein /GAPV - Adaptor proteins mediating binding of guanine nucleotide exchange factors to growth factor 
receptor^r'vertebrate GRB2. Caenorhabditis elegans sem-5 and Drosophila DRK. All of which nave two ^'"^ - 
- Mammalian Vav oncoprotein, a guanine nucleotide exchange factor of the CDC24 arn.ly. - 9"^"'"^^^^^^^^ 
releasing factors of the CDC25 family: yeast CDC25. yeast SCD25. fission yeast ste6. - "^^GUK prote.ns^^^^^^ 
proteins consist of at least three types of domains: one or more copies of the DHR domain, a SH3 domain and a C- 
Kni "uanyLte kinase domain'Members of this family are: Drosophila lethal(1)discs large-1 ♦"'"or suppressor 
protein(geneDlg1).mammaliantightjunctionproteinZO-1,vertebrateerythroc^eme^^^^^ 

ditis elegans protein lin-2. rat protein CASK and mammalian synaptic proteins SAP90/PSD-95, f •^APS^N-IIO/PSD 
93 SAP97/DLG1 and SAP102. - Miscellanous proteins interacting with vertebrate receptor protein tyrosine kinases. 
Srnmal^cXlasmte protein Nek (3 copies), oncoprotein Crk (2 copies). - Chicken Src substrate p80/85 protein 
~cS) anTthe similar'human hemopoietic lineage cell specfflc protein Hs1 . - Mammalian dihydround.ne-^^^^^^^^^ 
L-type calcium channel beta (regulatory) subunit including the related human myasthenic syndrome ant gen B (MSm 
-Mammalian neutrophil cytosolic activators of NADPH oxidase: p47 (NCF-1), p67 (NCF-2), and a potential homolog 
fr^mcTeSabditis elegans (B0303.7). NCF-1 and -2 have two copies of the SH3 domain while B0303.7 has four - 
Se myosin heavy chains from amoebae, slime molds and yeast (gene MY03). - Vertebrate and Drosophila spectnn 
an J^f^rTn alpha-chain. - Human amphiphysin. - Yeast actin-binding protein ABP1 . - Yeast '""^-^ 
(3 copies). - Yeast protein BEM1 and the fission yeast homolog scd2 (or ral3) (2 copies). - Yeas BEMI-b.nding proteins 
B0V2 (BEB1) and BOB1 (BOI1). - Yeast fusion protein FUS1. - Yeast protein RSV167. - Yeast protein SSU81 - Yeast 
hypotieLlproteinsYAR014c(1 copy). YFR024c (1 copy). YHL002w (1 copy). YHR016c{1 ^^y^'J-^^l;^^^^ 
YHR114W (2 copies) and the fission yeast homolog SpAC12C2.05c. - Caenorhabdrtis elegans hypothetical proteins 
F42H10.3. The profile developed to detect SH3 domains is based on a structural alignment consistng of 5 gap-free 
blocks and 4 linker regions totaling 62 match positions. ,„k»«\/p «4ar 

[ 1] Mayer B J., Hamaguchi M., Hanafusa H. Nature 332:272-275(1 988).[ 2] Musacchio A., Gibson J-. Lehto V.P.. Sa - 
Ls e M FEBS Lett. 307:55-61(1992).[ 3] Pawson T.. Schlessinger J. Curr. Biol. 3:434-442(1 993).[ 4] Mayer B J.. Bal- 
Smore D. Trends Cell Biol. 3:8-13(1993).[ 5] Pawson T Nature 373:573-580(1995).[ 6] Kuriyan J.. Cowbum D. Curr. 
Opin. Struct. Biol. 3:828-837(1993).[ 7] Morton C.J.. Campbell l.D. Curr. Biol. 4:615-617(1994). 
ri4571 590 Serine hydroxymethyltransferase pyridoxal-phosphate attachment site (SHMT) 
Serini hydroxymethyltransferase (EC (SHMT) [1] catalyzes the transfer of the 

to tetrahydrofolate to fomi 5.10-methylenetetrahydrofolate and glycine. In vertebrates, it exists in ^W'^smjc and a 
mitochondrial form whereas only one fomi is found in prokaryotes. Serine ^^^"'^'^^^^'^J'^^^V^ 1^^°^^ 
phosphate containing enzyme. The pyridoxal-P group is attached to a lysine residue around which the sequence is 

ridoxal-P attachment site] ^ v 

[ 1] Usha R., Savithri H.S., Rao N.A. Blochlm. Biophys. Acta 1204:75-83(1994). 

SlS^ig^lSomerSdomains are found in many phosphosugar isomerases and phosphosugar binding proteins. 

[1] Teplyakov A, Obmolova G, Badet-Denisot MA. Badet B, Polikarpov 1; Structure 1998;6:1047-1055. 

[14591 592. (SKI) Shikimate kinase Signature _,„„^iH= 

Shikimate kinase (EC 2.7.1.71) catalyzes the fifth step in the biosynthesis from chorismateof the aromatic am no acids 

ftlSLte patLaPyii^Ei^teria (gene aroK or aroL). plants and in fungi (where it is part -»7„^J--^ 

which catalyzes five consecutive steps in this pathway).Shikimate kinase is a small protein of about 200 residues. A 

conserved region that contains a mn of three glycines has been selected as a signature pattern 

Consensus pattern: [KR]-x(2)-E-x(3)-[LIVMFhx(8,12)-[LIVMF](2)-lSA]-x-G(3)- x-[LIVMF]. Proteins belonging to this 

family also contain a copy of the ATP/GTP- binding motif 'A* (P-loop). 

ri4611 SNAP-25'(synJptosome-associated protein 25 kOa) proteins are components of SNARE coinplexes^ Mem- 
bers of this family contain a cluster of cysteine residues that can be palmitoylated for '^^^^'^^^^^1^^^.^^^ 
11462] [llBrennwald P. Kearns B. Champion K. Keranen S. Bankaitis V, Novicl. P Cell '^^J'^^?^^^^^'^]^^. 
C. Blomqvist AG. Lundell I. Lambertsson A. Nassel D. Pieribone VA. Brodin L. Larhammar D; J Biol Chem 1993.268. 
24408-24414. 

[1463] 594. SNF2 and others N-terminal domain 

1464 This domain is found in proteins involved in a variety of processes including transcnption reg^'a^^^" («^9;; 
SNF2 STH1. brahma. MOT1). DNA repair (e.g.. ERCC6. RAD16. RAD5). DNA recombination RA^f^jJ"; 
chromatin unwinding (e.g.. ISWI) as well as a variety of other proteins with little functional infomiation (e.g.. lodestar. 
ETL1). 

[1465] 595. Staphylococcal nuclease homologues (Snase) 
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Present in alt three domains of cellular life. Four copies in the transcriptional coactivator p100. These, however, appear 
to lack the active site residues of Staphylococcal nuclease. Positions 14 (Asp-21), 34 (Arg-35). 39 (Asp*40), 42 (Glu- 
43) andComment: 110 (Arg-87) [SNase numbering in parentheses] are thought to be involved in substrate-binding and 
catalysis. 

5 [1] Ponting CP; Protein Sci 1997;6:459-463. [2] Callebaut I. Mornon JP; Blochem J 1997;321:125-132. 
[1 466] 596. SPRY domainA 

SPRY Domain is named from SPIa and the RYanodine Receptor. Domain of unknown function. Distant homologues 
are domains in Comment: butyrophilln/marenostrin/pyrin homologues. 
[1] Ponting C. Schultz J, Bork P; Trends Blochem Sci 1997;22:193-194. 

10 [1467] 597. (SQS PSY) Squalene and phytoene synthases signatures 

Two different polyisoprene synthases have been shown [1 ,2,3] to share a number of regions of sequence similarities: 
- Squalene synthase (EC 2.5.1.21 ) (famesyl-dtphosphate famesyltransferase) (SQS). which catalyzes the conversion 
of two molecules of farnesyl diphosphate (FPP) into squalene. It is the first committed step in the cholesterol biosynthetic 
pathway. The reaction carried out by SQS is catalyzed in two separate steps: the first is a head-to-head condensation 

15 of the two molecules of FPP to form presqualene diphosphate; this intermediate is then rearranged in a NADP-de- 
pendent reduction, to form squalene. SQS is found in eukaryotes. In yeast it is encoded by the ERG9 gene, in mammals 
by the FDFT1 gene. SQS seems to be membrane-bound. - Phytoene synthase (EC 2.5.1.-) (PSY), which catalyzes 
the conversion of two molecules of geranylgeranyl diphosphate (GGPP) into phytoene. It is the second step in the 
biosynthesis of carotenoids from isopentenyl diphosphate. The reaction carried out by PSY is catalyzed in two separate 

20 steps: the first is a head-to-head condensation of the two molecules of GGPP to form prephytoene diphosphate; this 
intermediate is then rearranged to form phytoene. PSY is found in all organisms that synthesize carotenoids: plants 
and photosynthetic bacteria as well as some non- photosynthetic bacteria and fungi. In bacteria PSY is encoded by 
the gene crtB. In plants PSY is localized in the chloroplast. As it can be seen from the description above, both SQS 
and PSY share a number of functional similarities which are also reflected at the level of their primary structure. In 

25 particular three well conserved regions are shared bySQS and PSY; they could be involved in substrate binding and/ 
or the catalytic mechanism. Signature patterns have been developed for the second and third conserved regions; they 
are localized in the central part of these enzymes. 

Consensus pattern: Y-[CSAM]-x(2)-[VSG]-A-[GSA]-[LIVAT]-IIV]-G-x(2)-[LMSC]- x(2)-[LIV] 
Consensus pattem: [LIVM]-G-x(3)-Q-x(2,3)-N-[IF]-x-R-D-[LIVMFY]-x(2)-[DE]- x(4.7)-R-x-[FY]-x-P- 
30 [ 1] Summers C, Karst F., Charies A.D. Gene 136:185-192(1993).[ 2] Robinson G.W.. Tsay Y.H., Kienzle B.K., Smith- 
Monroy C.A., Bishop R.W. Mol. Cell. Biol. 1 3:2706-2727(1 993).[ 3] Roemer S.. Hugueney P., Bouvier F.. Camara B., 
Kuntz M. Biochem. Biophys. Res. Commun. 196:1414-1421(1993). 
[1468] 598. SRP54-type proteins GTP-binding domain signature 

The signal recognition particle (SRP) is an oligomeric complex that mediates targeting and insertion of the signal 

35 sequence of exported proteins into the membrane of the endoplasmic reticulum. SRP consists of a 7S RNA and six 
protein subunits. One of these subunits, the 54 Kd protein (SRP54), is a GTP-binding protein that interacts with the 
signal sequence when it emerges from the ribosome. The N-termina! 300 residues of SRP54 include the GTP-binding 
site (G-domain) and are evolutionary related to similar domains in other proteins which are listed below [1]. - Escherichia 
coli and Bacillus subtilis ffh protein (P48), a protein which seems to be the prokaryotic counterpart of SRP54. Ffh is 

40 associated with a 4.5S RNA in the prokaryotic SRP complex. - Signal recognition particle receptor alpha subunit (dock- 
ing protein), an integral membrane GTP-binding protein which ensures, in conjunction with SRP, the correct targeting 
of nascent secretory proteins to the endoplasmic reticulum membrane. The G-domain is located at the C-terminal 
extremity of the protein. - Bacterial ftsY protein, a protein which is believed to play a similar role to that of the docking 
protein in eukaryotes. The G-domain is located at the C-terminal extremity of the protein. - The pilA protein from Neis- 

45 seria gonorrhoeae which seems to be the homolog of ftsY - A protein from the archaebacteria Sulfolobus solfataricus. 
This protein is also believed to be a docking protein. The G-domain is also at the C- terminus. - Bacterial flagellar 
biosynthesis protein flhF. The best conserved regions in those domains are the sequence motifs that are part of the 
GTP-binding site, but as those regions are not specific to these proteins, they were not used as a signature pattern. 
Instead, a conserved region located at the C-terminal end of the domain was selected. 

50 Consensus pattem: P-[LIVMJ-x-[FYL]-[LIVMATl-[GS]-x-[GS]-[EQ]-x(4)-[LIVMF] [ 1] Althoff S., Selinger D.. Wise J,A. 
Nucleic Acids Res. 22:1933-1947(1994). 

[1469] 599. (STphosphatase) Serine/threonine specific protein phosphatases signature 

Serine/threonine specific protein phosphatases (EC 3.1.3.16) (PP) [1,2,3] are enzymes that catalyze the removal of a 
phosphate group attached to a serine or evolutionary related. - Protein phosphatase-1 (PP1) is an enzyme of broad 
55 specificity. It is inhibited by two thermostable proteins, inhibitor-1 and -2. In mammals, there are two closely related 
isofomis of PP-1 : PP-1 alpha and PP-1 beta, produced by altemative splicing of the same gene. In Emericella nidulans, 
PP-1 (gene bimG) plays an important role in mitosis control by reversing the action of the nimA kinase. In yeast. PP- 
1 (gene SIT4) is involved in dephosphorylating the large subunit of RNA polymerase tl. - Protein phosphatase-2A 
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(PP2A) is also an enzyme of broad specificity. PP2A is a trimeric enzyme that consist of a core composed of a catelv^ic 

L.K.. J i«tpH with a 65 Kd reoulatorv subunit and a third variable subunit. In mammals, there are two closely 

rel^te^'i^^for^n's of the catalytic sub'unit of PP2A: PP2A-alpha and PP2A-beta. encoded by ^^P^^flf ^^^^^^ 
phosphatase-2B (PP2B or calcineurin). a calciuoMlependent enzyme whose act.v.ty .s f '^"'^*«':',''y '"^^^^^^^^^^ 
composed of two subunits: the catalytic A-subunit and the calcium-binding B-subun.t. The specificrty of PP2B s re. 
stricted In addition to the above-mentioned enzymes, some additional serine/threonmespecific pro em phosphatases 
h^e b^en Zacterized and are listed below. - Mammalian phosphatase-X (PP-X). and D^^-^P^.J^ Pj^^^^^^^ 
(PP-V) which are closely related but yet distinct from PP2A. - Yeast phosphatase PPH3. wh.ch .s s.m lar to PP2A but 
with d-rfferent enzymatic properties. - Drosophila phosphatase-Y (PP-Y), and yeast phosphatases Z1 and Z2 (genes 
PPZ1 and PPZ2)whlch are closely related but yet distinct from PP1. - Drosophila ret.nal degenera^on Prote.n C (gene 
rdgC). a calcium-binding phosphatase required to prevent light-induced retinal degeneration. - Phages Lambda and 
Phi-80 ORF-221 which have been shown to have phosphatase activity and are related to mammalian PP s. The best 
conserved regions in these proteins Is a perfectly conserved pentapeptide that can be used as a signature pattern. 
Consensus pattern: [LIVM]-R-G-N-H-E- or4-;>ia-*«> 7id38 

f 11 Cohen P Annu. Rev. Biochem. 58:453-508(1989).[ 2] Cohen R. Cohen RT.W. J. Biol. Chem. 264.21435-21438 
{1989).l 3] Cohen P.T.W., Brewis N.D., Hughes V., Mann D.J. FEBS Lett. 268:355-359(1990). 
ri4701 600. Translation initiation factor sun signature ... . ..^ .ic 

n budding yeast (Saccharomyces cerevisiae). SUI1 is a translation initiation factor that functions in with elF^ 

2 and thelnftiator tRNA-Met in directing the ribosome to the proper start site of translation [U SUI1 ,s a P^ote n of 108 
resS.es Close homologs of SUM have been found [2] in mammals, insects and plants. SUI1 ,s also evolutionary 
reSJed to hypothetical proteins from Escherichia coti (yclH), Haemophilus influenzae (HI1^5) and Methanococcus 
vannlelli A conserved region in the C-temiinal section has been selected as a signature pattenri. 
Consensus pattern: [LIVMl-[EQ]-tLIVM]-Q-G-[DEN]-[KHQ]-[KR\/] ^ . „ „ Ri„nh« Res 

[ 1] Yoon H. Donahue TR Mol. Cell. Biol. 12:248-260(1992).[21 Fields CA.. Adams M.D. Biochem. B.ophys. Res. 

Commun. 198:288-291(1994). „ ^k,„^„* eito 

114711 601 (S T dehydratase) Serine/threonine dehydratases pyridoxal-phosphate attachment site 
Lerini and threonine dehydratases [1 .2] are functionally and stnicturally related PV/l'^^-'-P'^^P'^.^f,^^,^^^^^^^^ 
zvmes- - L-serine dehydratase (EC 4.2.1.13) and D-serine dehydratase (EC 4.2.1.14) catalyze the dehydrata ion of L- 
Srine (resp^rej D-serine) into a;^;;;^^and pyruvate. - Threonine dehydratase (EC4,2i16) (TDH) catalyzes the 
dehyditXn1.f threonine into alpha-ketobutarate and ammonia. In Escherichia coli and o*er 
classes of TDH are known to exist. One is Involved in the biosynthesis of isoleucine. tt,e other f^y«^ra=*r,'^^^^^^ 
catabolism.Threonine synthase (EC 4.2.99.2 ) Is also a pyridoxal-phosphate enzyme, it ca alyzes 
of homoserine-phosphato into threonine. It has been Shown [3] that threonine synthase IS d« 
threonine dehydratases. In all these enzymes, the pyridoxal-phosphate group is attached to a lysine residue. The 
a^und this residue is sufficiently conserved to allow the derivation of a pattern specific to sennefthreonine 

rn^erufp^ttf^Ts^^^^^^^^^^^^ 

Ml'oTawa' H^omi T. Konishi K., Date T. Naakashima H.. Nose K.. Matsuda Y. Peraino P«ot H.C. Fujioka M. 
J Bio? Chem 264:15818-15823(1989).[ 2] Datta R, Goss T.J.. Omnaas J.R.. Patil R.V. Proc. Natl. Acad. S«. U^SA^ 
84r393-397(1987).l 3] Parsot C. EMBO J. 5:301 3-3019(1986).[ 4] Grabowski R.. Hofmeister A.E.M.. Buckel W. Trends 
Biochem. Sci. 18:297-300(1993). 

ri4721 Cysteine synthase/cystathionine beta-synthase P-phosphate attachment site 

Cyste ne synthase (CSase) is the pyridoxal^,hosphate dependent enzyme responsible [1] for the ^'^"-f^J'^l^^^^^ 
from O-acetyl-serine and hydrogen sulfide with the concomitant release of acetic acid. In bacteria suchas Esche"ch,a 
coli. two forms of the enzyme are known (genes cysK and cysM).ln plants there are also two forms, '^^^^^^^^^^ 
cytoplasm and the otherin chloroplasts.Cystathionine beta-synthase [21 catalyzes the ^^^I^J^^^^'^^'''^^^^"^^ 
cysteine transulfuration; the conjugation of homocysteine andserine fomiing cystathionine. Like Csase it 's a pyndoxal- 

phosphate dependent enzyme. The two types of enzymes are evolutionary related. The PV^^'^^'-P^^^P^ J^me! 
CSases has bLnshowntobeattachedtoalysineresiduewhich is located in the N-temiinalsection of tf^^^^^ 

thfsequence around this residue is highly conseo^ed and can be used as a signature pattern to detect this dass of 
Sensus pattern: K-x-E-x(3HPA]-[STAGC]-x-S-[IVAP]-K-x-R-x-(STAG]-x(2)-[LIVM] Fhe 2nd K is the pyridoxal-P 

Ml'saUo K Turosawa M.. Murakoshi I. FEBS Lett. 328:111-114(1993).[ 2] Swaroop M.. Bradley K.. Ohura T. Tahara 
T. Roper m'.D.. Rosenberg L.E.. Kraus J.R J. Biol. Chem. 267:11455-11461(1992). 

£tocus g^hSp^rclSH faS in Brassicaceae. self-incompatible plants have a self/non-self Comment: recognition sys- 
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tern. This is sporophytically controlled by Comment: multiple alleles at a single locus (S). S-locus glycoproteins.Com- 
ment: as well as S-receptor kinases, are in linkage with the S-alleies [1].Number of members: 128 
[1] Evolutionary aspects of the S-related genes of the Brassica self-incompatibility system: synonymous and nonsyn- 
onymous base substitutions. Hinata K, Watanabe M, Yamakawa S, Satta Y. Isogai A; Genetics 1995;140:1099-1104. 
5 [2] Polymorphism of the S-locus glycoprotein gene (SLG) and the S-locus related gene (SLR1) in Raphanus sativus 
L. and self-incompatible omamental plants in the Brassicaceae. Sakamoto K, Kusaba M, Nishio T; Mol Gen Genet 
1998;258:397-403. 

[1474] 603. (sdh cyt) Succinate dehydrogenase cytochrome b subunit signatures 

Succinate dehydrogenase (SDH) is a membrane-bound complex of two main components: a membrane-extrinsic com- 
10 ponent composed of an FAD-binding flavoprotein and an iron-sulfur protein, and a hydrophobic component composed 
of a cytochrome B and a membrane anchor protein. The cytochrome b component- is a mono heme transmembrane 
protein [1.2,3] belonging to a family that groups: - Cytochrome t>-556 from bacterial SDH (gene sdhC). - Cytochrome 
b560 from the mammalian mitochondrial SDH complex. - Cytochrome b560 subunit encoded in the mitochondrial ge- 
nome of some algae and in the plant Marchantia potymorpha. - Cytochrome b from yeast mitochondrial SDH complex 
15 (gene SDH3 or CYB3). - Protein cyt-1 from Caenorhabditis.These cytochromes are proteins of about 1 30 residues that 
comprise threetransmembrane regions. There are two conserved histidines which may beinvolved in binding the heme 
group. Two signature pattems have been developed that include these histidine residues. 
Consensus pattern: R-P-[LIVMT]-x(3)-[LIVM]-x(6)-[LIVMWPK]-x(4)-S-x(2)-H-R-x-[ST] [H could be a heme ligand] 
Consensus pattern: H-x(3)-[GA]-[LIVMTl-R-[HF]-[LIVMI=]-x-[FYWM]-D-x-[GVA] [H could be a heme ligand] 
20 [ 1] Yu L.. Wei Y.-Y. Usui S.. Yu C.-A. J. Biol. Chem. 267:24508-2451 5(1 992). [ 2] Abraham PR.. Mulder A.. Vant Riet 
J., Raue H.A. Mol. Gen. Genet. 242:708-71 6(1 994).[ 

3] Leblanc C, Boyen C, Richard O.. Bonnard G., Grienenberger J.M., Kloareg B. J. Mol. Biol. 250:484-495(1995). 
[1475] 604. Sec! family 

[1] The Sec1 family: a novel family of proteins involved in synaptic transmission and general secretion. Hatachmi N, 
25 Lev Z; J Neurochem 1996;66:889-897. 
Number of members: 40 

[1476] 605. Protein secE/sec61 -gamma signature 

In bacteria, the secE protein plays a role in protein export; it is one of the components - with secY and secA - of the 
preprotein translocase. in eukaryotes, the evolutionary related protein sec61 -gamma playsa role in protein translocation 

30 through the endoplasmic reticulum; it is part of a trimeric complex that also consist of sec61 -alpha and beta [1]. Both 
secE and sec61 -gamma are small proteins of about 60 to 90 amino acids that contain a single transmembrane region 
at their C-terminal extremity (Escherichia colisecE is an exception, in that it possess an extra N-terminal segment of 
60residues that contains two additional transmembrane domains).The sequence of secE/sec61 -gamma is not extreme- 
ly well conserved, however it is possible to derive a signature pattern centered on a conserved proline located 10 

35 residues before the beginning of the transmembrane domain. 

Consensus pattern: [LIVMFY]-x(2)-[DENQGA]-x(4)-[LIVMFTAl-x-[KRV]-x(2)-[KW]-P-x{3)-[SEQ]-x(7)-[LIVT]-[LIVGA]- 
[LIVFGASTl 

[ 1] Hartmann E., Sommer T, Prehn S., Goeriich D., Jentsch S.. Rapoport T.A. Nature 367:654-657(1994). 
[1477] 606. 11-S plant seed storage proteins signature 
40 Plant seed storage proteins, whose principal function appears to be the major nitrogen source for the developing plant, 
can be classified, on the basis of their structure, into different families. 11-S are non-glycosylated proteins which form 
hexameric structures [1,2]. Each of the subunits in the hexamer is itself composed of an acidic and a basic chain 
derived from a single precursor and linked by a disulfide bond. This structure is shown in the following representation. 
+ +11 

45 xxxxxxxxxxxCxxxxxxxxxxxxxxxxxxxxxxNGxCxxxxxxxxxxxxxxxxxxxxxxx •****••** < ^Acidic-subunit >< 

Basic-subunit > < About-480-to-500-residues >'C*: conserved cysteine involved in a di- 
sulfide bond.***: position of the pattern. Proteins that belong to the 11-S family are: pea and broad bean legumins, rape 
cruciferin, rice glutelins, cotton beta-globulins, soybean glycinins, pumpkin 11-S globulin, oat globulin, sunflower heli- 
anthinin G3, etc. The region that includes the conserved cleavage site between the acidic and basic subunits (Asn- 

50 Gly) and a proximal cysteine residue which is involved in the interchain disulfide bond have been used as a signature 
pattern for this family of proteins. 

Consensus pattem: N-G-x-[DE](2)-x-[LlVMF]-C-[ST]-x(11.12)-[PAG]-D [C is involved in a disulfide bond 
[ 1] Hayashi M.. Mori H.. Nishimura M., Akazawa T.. Hara-Nishimura I. Eur. J. Biochem. 172:627-632(1 988). [ 2] Shotwell 
M.A., Afonso C, Davies E.. Chesnut R.S., Larkins B.A. Plant Physiol. 87:698-704(1988). 
55 [1478] 607. 7S seed storage protein 

[1479] 7S globulin is one of the main storage proteins of most angiosperms and gymnosperms. The 7S storage 
proteins are homotrimers. 
Number of members: 67 
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[1480] [1] The three-dimensional structure of canavalin from jack bean (Canavalla ensiformis). Ko TP, Ng JD. McPher- 

A - ptop.t Ph"sio! 1993; 101 "79^-744. 
[14811 608. Aspartate-semialdehyde dehydrogenase signature ^„,«^Hir.« 
LpaLe-semialdehydedehydrogenase(ASD)catalyzesthesecondstepinthecommonb.osynthet.cp^^^^^^^^ 

from Asp to diaminopime.ate and Lys. to Met. and to Thr: the NADP-def«ndent reductive «1«P»^°;P'^°?^^'°'^^^^^^^ 
asoartvl Phosphate to L-aspartate-semialdehyde. in bacteria and fungi. ASDIs a protein of about 40 Kd (340 to 370 
"sTdSLrwhretquence'is not extremely well conserved [1]. A conserved cysteine residue ^^^^ J"-*^^^^^^ 
important for the catalytic activity [2].The region of conservation around the active s,te residue is ^of^^H to be used 
L^^gnature pattern. Another more conserved region, located in the last third of the sequence, and which contains 
both a conserved cysteine as well as an histldine has been used instead. 

Consensus pattern: [LIVM]-[SADN]-x(2)-C-x-R-[LIVM]-x(4)-tGSCl-H-lSTA Karsten WE 

[ 1) Baril C. Richaud C, Foumi E., Baranton G., Saint Girons I. J. Gen. Microbiol. 138.47-53(1992).I21 Karsten w.t., 
Viola R.E. Biochim.Biophys. Acta 1121:234-238(1992). „u„,«. „h.„4«b 

[1482] N-acetyl-gamma-glutamyl-phosphate reductase active site N-acetyl-gamma-glutamy^phosp^«te reductase 
EC 1 2 1 38) (AGPR) [1.2Tis the enzyme that catalyzes the third step in the biosynthesis of arginine from 
he l^^S^pendent reduction of N-acetyl-5-glutamyl phosphate into N-acetylglutamate 5-semialdehyde.ln bactena 
« ls a monofunctional protein of 35 to 38 Kd (gene argC) while in fungi it is part of a bifunctional '^^'ocUo^'^nsl euz^e 
fgene ARG5.6. argil orarg-6) which contains a N-temiinal acetylglutamate kinase (EC ZLM) ^^^^^^^^^^^^'^^l 
AGPR domain. In the Escherichia coli enzyme, a cysteine has been shown to be implicated in the catalytK: activrty, the 
region around this residue is well conserved and can be used as a signature pattern. „^iH„»i 
consensus pattern: [LIVM]-[GSA]-x-P-G-C-[FYl-[AVPl-T-[GA]-x(3HGTAC]-[L1VM]-x-P [C active site rescue] 
[1] Ludovice M.. Martin J.F.. Carrachas P. Liras P. J. Bacteriol. 174:4606^6 13(1 992). [2] Gessert S.R. K.m J.H.. 
Nargang RE., Weiss R.L. J. Biol. Chem. 269:8189-8203(1994). 
[1483] 609. Sialyltransferase family. 
Number of members: 18 
[1484] 610. SpoU rRNA Methylase family 

This familv of oroteins probably use S-AdoMet. Number of members: 58 ^^.^r-- , • 

[l]irou S^-oSEscEerichia coli belongs to a new fam-^^ 

aLs Res 1993;21:5519-5519. [2] The spoU gene Of escherichia coll .thefourthgeneofth^^^^ 

for tRNA (Gm18) 2 • methyltransferase activity. Persson SC. Jager G. Gustafsson C; Nucleic Acids Res 1997,25. 

4093-4097. 

[1485] 611. Stathmin family signatures „„,:„=. „»rioK, 

Stathmin [11 (from the Greek 'stathmos'which means relay), is an ubiquitous intracellular protein, present in a vanely 
S pShoi^id forms and which serves as a relay for diverse second messenger ^^'^^^^IZr^l^' 
phosphoryl^ion are regulated throughout development and in response to e,dracellular signals ^^S^f '"9 J^" P °'' 
eration. differentiation and function. Stathmin is a highly conserved protein of 149 ar^ino ^ 
consists of an N-tem,inal domain of about 45 residues followed by a 78 residue alpha-helical domain consisting of a 
^oS repeat coiled coil structure and a C-temninal domain of 25 residues. Protein SCG10 is a neuron-specific, mem- 
Sisodlte^^^ 

to stathmin. but differs in that it contains an additional N-terminal hydrophobic segment of 32 res'«J«« ^^^^ 
responsible for its interaction with membranes. Xenopus protein XB3 is also ^^^^^ 
contains an additional N-temiinal hydrophobic domain [2]. A conserved decapept.de which ends with he fi^t th^e 
residues of the coiled coil domain and a second pattern that corresponds to part of the central region of the coiled coil 
have been selected as signatures for proteins of the stathmin family. 
Consensus pattern: P-[KRQ]-[KRl(2)-[DE]-x-S-L-[EGl-E- 

Consensus pattern: A-E-K-R-E-H-E-[KR]-E- u_i- »j c»k»i a i Rini rhom 

[1] Sobel A. Trends Biochem. Sci. 16:301-305(1991).[ 2] Maucuer A.. Moreau J.. Mechali M.. Sobel A. J. Biol. Chem. 

U^&^^6^TsUA&yS^lyTdC family signature. The following uncharacterized proteins have been shown [1] to share 
^for^s of simiterities: - Yeast protein SUA5. - Escherichia coli hypothetical protein yciO and Hill 98. the corresponding 
3mophiSTinfluenzae protein. - Escherichia coli hypothetical protein yrdC and H10656. the correspondmg Haemo- 
phMrn^re^roteln." Bacillus subtilis hypothetical protein ywlC. - Mycobacterium leprae hypothetical protein in 
rfe-hJmK intorgenic region. - Methanococcus jannaschii hypothetical protein MJ0062.These are proteins of from 2^ 
to^6Kd Which containanumberof conserved regions in their N-tem,inal section. They can be picked up in the 

n4«^^°'corlsLTu?S^ttern:[LIVMTA](3)-[LIVMFYC]-[PG]-T-[DE]-(STAl-x-[^^^ 
[1488] [ 1] Bairoch A., Rudd K.E., Robison K. Unpublished observations (1995). 
[1489] 613. Sucrose synthase 
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Sucrose synthases catalyse the synthesis of sucrose from UDP-glucose and fructose. This family includes the bulk of 
the sucrose synthase protein. However the carboxyl terminal region of the sucrose synthases belongs to the glycosyl 
transferase famWy Glycos_transf_1 . 
[1490] 614. Sulfotransferase proteins 
5 Number of members: 59 

[1491] 615. Synaptophysin / synaptoporin signature 

Synaptophysin and synaptoporin [1] are structurally related proteins, found in the membrane of synaptic vesicles, which 
may function as ionic or solute channels. These two glycoproteins seem to span the membrane four times. Both their 
N- and C-termini sequences seem to be cytoplasmically located. As a signature pattem for this family of proteins, a 

10 highly conserved region located in the beginning of the first intravesicular loop just after the first transmembrane domain 
has been selected. This region contains a cysteine residue that may be involved in a disulfide bond. 
Consensus pattem: L-S-V-[DE]-C-x-N-K-T [C may be involved in a disulfide bond [ 1] Knaus P., Marqueze-Pouey B., 
Scherer H., Betz H. Neuron 5:453-462(1990). 
[1492] 616. Syndecans signature 

15 Syndecans [1 ,2] (from the greek syndein; to bind together) are a family of transmembrane heparan sulfate proteogly- 
cans which are implicated in the binding of extracellular matrix components and growth factors. Syndecans bind a 
variety of molecules via their heparan sulfate chains and can act as receptors or as co-receptors. Structurally, these 
proteins consist of four separate domains: a) A signal sequence; b) An extracellular domain (ectodomain) of variable 
length and whose sequence is not evolutionary conserved in the various forms of syndecans. The ectodomain contains 

20 the sites of attachment of the heparan sulfate glycosaminoglycan side chains; c) A transmembrane region; d) A highly 
conserved cytoplasmic domain of about 30 to 35 residues which could interact with cytoskeletal proteins. The proteins 
known to belong to this family are: - Syndecan 1 . - Syndecan 2 or fibroglycan. - Syndecan 3 or neuroglycan or N- 
syndecan. - Syndecan 4 or amphiglycan or ryudocan. - Drosophila syndecan, - Caenorhabditis elegans probable syn- 
decan (F57C7.3).The signature pattern that has been developed for syndecans starts with the last residue of the 

25 transmembrane region and includes the first 10 residues of the cytoplasmic domain. This region, which contains four 
basic residues, could act as a stop transfer site. 
Consensus pattem: [FY]-R-[IMJ-(KR]-K(2)-D-E-G-S-Y 

[ 1] Bemfield M., Kokenyesi R., Kato M., Hinkes M.T., Spring J., Gallo R.L.. Lose E.J. Annu. Rev. Cell Biol. 8:365-393 
(1992).[ 2] David G. FASEB J. 7:1023-1030(1993). 

30 [1493] 617. Syntaxin / epimorphin family signature 

The following proteins have been shown to be evolutionary related [1 ,2,3]: - Epimorphin (or syntaxin 2). a mammalian 
mesenchymal protein which plays an essential role in epithelial morphogenesis. • Syntaxin 1 A (also known as antigen 
HPC-1) and syntaxin 1B which are synaptic proteins which may be involved in docking of synaptic vesicles at presy- 
naptic active zones. - Syntaxin 3. - Syntaxin 4, which is potentially involved in docking of synaptic vesicles at presynaptic 

35 active zones. - Syntaxin 5, which mediates endoplasmic reticulum to golgi transprart. - Syntaxin 6, which is involved in 
intracellular vesicle trafficking. - Syntaxin 7. - Yeast PEP12 (or VPS6) which is required for the transport of proteases 
to the vacuole. - Yeast SED5 which is required for the fusion of transport vesicles with the Golgi complex. - Yeast SS01 
and SS02 which are required for vesicle fusion with the plasma membrane. - Yeast VAM3, which is required for vacuolar 
assembly. - Arabidopsis thaliana protein KNOLLE which may be involved in cytokinesis. - Caenorhabditis elegans 

^ hypothetical proteins F35C8.4. F48F7.2, F55A11.2 and T01B11.3.The above proteins share the following character- 
istics: a size ranging from30 Kd to 40 Kd; a C-terminal extremity which is highly hydrophobic and isprobably involved 
in anchoring the protein to the membrane; a central, well conserved region, which seems to be in a coiled-coil confor- 
mation. The pattem specific for this family is based on the most conserved region of the coiled coil domain. 
Consensus pattern: [RQ]-x(3)-[LIVMA]-x(2)-[LIVM]-[ESH]-x(2)-[LIVMTj-x-[DEVMJ-[LlVM]-x(2)-[LlVM]-[FS]-x(2)- 

45 ILlVM]-x(3)-[Ll\rn-x(2)-Q-[GADEQ]-x(2)-[LIVM]-[DNQTj-x-[LlVMF]-[DESVl-x(2)-[LIVM] 

1 1] Bennett M.K., Garcia-Arraras J.E., Elferink L.A.. Peterson K., Fleming A.M.. Hazuka CD.. Scheller R.H. Cell 74: 
863-873(1 993).f 2] Spring J., Kato M.. Bemfield M. Trends Biochem. Scl. 18: 124-1 25(1 993).[ 3] Pelham H.R.B. Cell 
73:425-426(1993). 
[1494] 618. Sm protein 

50 The U1 , U2. U4/U6, and U5 small nuclear ribonucleoprotein particles (snRNPs) involved in pre-mRNA splicing contain 
seven Sm proteins (B/B*. D1. D2, D3, E, F and G) in common, which assemble around the Sm site present in four of 
the major spliceosomal small nuclear RNAs. These proteins contain a common sequence motif in two segments, Smi 
and Sm2, separated by a short variable linker. 

[1495] [1] Hermann H, Fabrizio P, Raker VA, Foulaki K, Hornig H, Brahms H, Luhrmann R EMBO J 1995;14: 
55 2076-2088. [2] Kambach C. Waike S. Young R. Avis JM. de la Fortelle E. Raker VA, Luhrmann R. Li J. Nagai K; Cell 
1999;96:375-387. 
[1496] 619. Skp1 family 

[1497] [1] Stebbins CE, Kaelin WG Jr. Pavletich NP; Science 1999;284:455-461. 
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^l?!l..l^Hr^rn^lSf an important ro.e in protein export, .t interacts with the signa, sequences of 
se^Jt^'r; proteins as well as v^ith two other components of the protein translocation system: secA 
anTnteqrai plasma membrane protein of 419 to 492 amino acid residues that apparently contains ten transmembrane 
Seamen s Such a structure probablyconfers to secY a 'translocato,' function, providing a channel or penp^sm.c and 
ouiT mtmbrane p^cursor proteins^omologs of secY are found in archaebacteria [2]. SecY is also encoded .n 
SoroplXe^ome of some algae [3] where it could be involved in a prokaryotic-like protein export system across the 

irembren^rh:^ 

Two signature patterns have been developed for secY proteins. The first corresponds to the second ^^"^'^^"^brane 
Zon which is the most conserved section of these proteins. The second spans the C-tem^.nal part of the fourth 
transmembrane region, a short intracellular loop, and the N-terminal part of the fifth transmembrane r^ion. 
rnsensurrattem [GST]-[LIVMn(2)-x-[LIVM]-G-[LIVM 



clsensus pattern: [LlVMFYVV](2)-x-IDEl-x-[LIVMn-lSTN]-x(2V^^^^^^ 
[ 1] Ito K. Mol. Microbiol. 6:2423-2428(1 992).( 2] Auer J.. SpickerG., BoeckA. Bioch.mie73.683-688(1991).[ 3) Douglas 

nL?^62Ms2SSn\^sSll hydrophilic plant seed proteins signature. The following small hydrophilic plant seed 

p^L are sIruCura'lly relied: - Arabidopsis thaliana proteins GEA1 and GEA6 't3!^^79TM"^^^^^^^ 

dant (LEA) protein D-19. - Carrot EMB-1 protein. - Barley LEA proteins B19.1A, 819.18. B19.3 and B19.4. - Ma'^e 'ate 

embiogeneSs abundant protein Emb564. - Radish late seed maturation p^tein p8B6.Rice f 

pSn Imp1. - sunflower 10 Kd late embryogenesis abundant protein (DS10). - Wheat Em proteins. These pro e ns 

contains from 83 to 153 amino acid residues and may play a role[1.2] in equipping the seed 

a minimal level of hydration in the dry organism and preventing the denaturation °^„^°PS^'"'\~^„^^^^^^^ 

may also play a role during imbibition by controlling water uptake. As a s.gnature pattern, the best ^"^^'^ed r^'°" 

rmesequenceof these proteins has been developed, it isaglycine-richnonapeptide located intheN-temiinal section.- 

riSOOl Consensus pattern: G-[EQ]-T-V-V-P-G-G-T- d,«„* ^i^i 

501 1 1] Dure L. ill. Crouch M.. Harada J.. Ho T-H. D.. Mundy J.. Quatrano R.. Thomas T. Sung Z-R- PJ^nt Mo^" 
Biol. 12:475-486(1989).[ 2] Gaubier P.. Raynal M.. Hull G.. Huestis G.M.. Grellet F.. Arenas C. Pages M.. Delseny M. 
Mol. Gen. Genet. 238:409-418(1993). 

Sfk^Ln'^b^x^:^^^^^^^ 

o the serine carlixypeptidases, like that of the trypsin family serine proteases, is provided by a charge relay s^em 
involving an aspartic acid residue hydrogen-bonded to a histidine. which is itself hydrogen^onded to a ser ne [1]. 
Proteins known to be serine cart>oxypeptidases are: - Bariey and wheat serine cart>oxypept,dases I. II and 111 [2]. 
Y2S ciZypeptidase Y (YSCY) (gene PRC1). a vacuolar protease involved in degrading small P^P^^^e- - Yeast 
KEX1 protease involved in killer toxin and alpha-factor precursor processing. - Fission yeast sxa2. a probable cartK.x- 
ypep idase involved in degrading or processing mating pheromones [3]. - Penicillium J-"*^"-"'^^^^^^^^^^^^ 
SI [41 - Aspergullus niger carboxypeptidase pepR - Aspergullus satoi cart,oxypeptidase cpdS. - ^ert^l^ate prcrtechve 
pro e n / cathepsin A 15). a lysosomal protein which Is not only a cartwxypeptidase but also essential for the activity of 
Sr^L Salarsidise and neuraminidase. - Mosquito vitellogenic cart,oxypeptidase (VCP) [6]. - Naeg ena fowten 
vJulence-related protein Nf314 [7]. - Yeast hypothetical protein YBR139w. - Caenortiabditis elegans hypothetical p o^ 
teS^s C08H9 1 F13D12.6. F32A5.3, F41C3.5 and K10B2.2.This family also includes: - Sorghum (s)-hydroxymandelo- 
n^trile lyase jh^droxynitrile lyase) (HNL) 18], an enzyme involved in plant cyanogenesis^The sequences surrounding 
the active site serine and histidine residues are highly conserved in all these senne cartwxypeptidases. 
rnn«pn<:.i«; Dattem" ILlVMl-x-IGTA1-E-S-Y-rAG]-[GS] [S is the active site residue] 

Consensu^ paSem: Suvn-x(l[^^^^^^^ ^ 

ntSo'Dr^Semington S J. J. Biol. Chem. 265:6528-6531(1990).[2] Sorensen S.B.. Svendsen L. Breddam K. 
^riSerg Res Commun 54:193-202(1989).[ 3] Imai Y. Yamamoto M. Mol. Cell. Biol. 12:1827-1834(1992)^ 4] Sv- 
Sd^n 1 HoLann T En^Ezi J Remington J , Breddam K. FEBS Lett. 333:39-43(1993).[ 5] Galjart N.J.. Morreau 
H^LminT Gii'r^nsT Binten E.J.. d'Azzo A J. Biol. Chem. 266:14754-14762(1991).[ 6] Cho W.L.. Deitsch 
K W SS^S P^r^^^^^ U.S.A. 88:10821-10824(1991).! 7] Hu W.N.. Kopachik W.. Band R.N. Infect. 

rmln6oSr8-2424(%^^^^^^ 

inasND Ban-ett A.J. Meth. Enzymol. 244:19-61(1994).[E1] „ 
15031 623. Serpins signature. Serpins (SERine Proteinase INhibitors) [1.2.3.41 are a group of st™«u^lly related 
proteins They are high molecular weight (400 to 500 amino acids).extracellular. irreversible senne protease inhibitors 
STwen dined struaural-functional characteristic: a reactive region that acts as a "baif for an appropna^e serine 
Cea^ TOsSonte found in th^ 
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family are listed below (references are only provided for recently determined sequences): - Alpha-1 protease inhibitor 
(alpha-1 -antitrypsin, contrapsin). - Alpha-1 -antichymotrypsin. - Antithrombin ML - Alpha-2-antiplasmin. - Heparin co- 
factor II. - Complement CI inhibitor. - Plasminogen activator inhibitors 1 (PAI-1) and 2 (PAI-2). - Glia derived nexin 
(GDN) (Protease nexin I). - Protein C inhibitor - Rat hepatocytes SPI-1 , SPI-2 and SPI-3 inhibitors. - Human squamous 

5 cell carcinoma antigen (SCCA) which may act in the modulation of the host immune response against tumor cells. - A 
lepidopteran protease inhibitor. - Leukocyte elastase inhibitor which, in contrast to other serpins, is an intracellular 
protein. - Neuroserpin [5], a neuronal inhibitor of plasminogen activators and plasmin. - Cowpox virus crmA [6], an 
inhibitor of the thiol protease interleukin-1B converting enzyme (ICE). CrmA is the only serpin known to inhibit a non- 
serine proteinase. > Some orthopoxviruses probable protease inhibitors, which may be involved in the regulation of the 

10 blood clotting cascade and/or of the complement cascade in the mammalian host. On the basis of strong sequence 
similarities, a number of proteins with no known inhibitory activity are said to belong to this family: - Birds ovalbumin 
and the related genes X and Y proteins. - Angiotensinogen; the precursor of the angiotensin active peptide. - Barley 
protein Z; the major endosperm albumin. - Corticosteroid binding globulin (CBG). - Thyroxine-binding globulin (TBG). 
-Sheep uterine milk protein (UTMP) and pig uterof err in-associated protein (UFAP). - Hsp47, an endoplasmic reticulum 

15 heat-shock protein that binds strongly to collagen and could act as a chaperone in the collagen biosynthetic pathway 
[7]. - Maspin, which seems to function as a tumor supressor [5]. - Pigment epithelium-derived factor precursor (PEDF), 
a protein with a strong neutrophic activity [8]. - Ep45, an estrogen-regulated protein from Xenopus [9]. A signature 
pattern has been developed for this family of proteins, centered on a well conserved Pro-Phe sequence which is found 
ten to fifteen residues on the C-terminal side of the reactive bond 

20 [1504] Consensus pattem: [LIVMFYJ-x-[LIVMFYAC)-[DNQ]-[RKHQSl-[PST]-F-[LIVMFY]-[LIVMFYC]-x-[LIVMFAH]- 
[ 1] Carrell R,, Travis J. Trends Biochem. Sci. 10:20-24(1 985). [ 2] Carrell R., Pemberton P.A.. Boswell D.R. Cold Spring 
Harbor Symp. Quant. Biol. 52:527-535(1 987).[ 3] Huber R., Carrell R.W. Biochemistry 28:8951-8966(1 989).[ 4] 
Remold-O'Donneel E. FEBS Lett. 315:105-108(1993).[ 5] Osterwalder T, Contartese J., Stoeckli E.T., Kuhn TB., Son- 
deregger P. EMBO J. 1 5:2944-2953(1 996). [ 6] Komiyama T. Ray C.A., Pickup D.J., Howard A.D., Thomben^ N.A., 

25 Peterson E.P., Salvesen G. J. Biol, Chem. 269:1 9331-1 9337(1 994).[ 7] Clarke E., Sandwal B.D. Biochim. Biophys. 
Acta 11 29:246-248(1 992).[ 8] Zou Z.. Anisowicz A., Neveu M., Rafidi K., Sheng S., Sager R., Hendrix M.J.. Seftor E., 
Thor A. Science 263:526-529(1 994). [ 9] Steele F.R., Chader G.J., Johnson L.V.. Tombran-Tlnk J. Proc. Natl. Acad. 
Sci. U.S.A. 90:1526-1530(1993).[10] Holland L.J.. Suksang C, Wall A.A., Roberts L.R.. Moser D.R.. Bhattacharya A. 
J. Biol. Chem. 267:7053-7059(1992). 

30 [1505] 624. Sigma-54 interaction domain signatures and profile 

Some bacterial regulatory proteins activate the expression of genes from promoters recognised by core RNA polymer- 
ase associated with the alternative sigma-54 factor. These have a conserved domain of about 230 residues involved 
in the ATP-dependent [1,2] interaction with sigma-54. This domain has been found in the proteins listed below: - acoR 
from Aicaligenes eutrophus, an activator of the acetoin catabolism operon acoXABC.-algB from Pseudomonas aeru- 

35 ginosa, an actuator of alginate biosynthetic gene algD. - dctD from Rhizobium an activator of dctA, the C4-dicarboxylate 
transport protein, - dhaR from Citrobacter freundii, a regulator of the dha operon for glycerol utilization. - fhIA from 
Escherichia coli, an activator of the formate dehydrogenase H and hydrogenase III structural genes. - flbD from Cau- 
lobacter crescentus, an activator of flagellar genes. - hoxAfrom Aicaligenes eutrophus, an activator of the hydrogenase 
operon. - hrpS from Pseudomonas syringae, an activator of hprD as well as other hrp loci involved in plant pathogenicity. 

40 - hupRI from Rhodobacter capsulatus, an activator of the [NiFe] hydrogenase genes hupSL. - hydG from Escherichia 
coli and Salmonella typhimurium, an activator of the hydrogenase activity. - levR from Bacillus subtilis, which regulates 
the expression of the levanase operon (levDEFG and sacC). - nifA (as well as anfA and vnfA) from various bacteria, 
an activator of the nif nitrogen-fixing operon. - ntrC, from various bacteria, an activator of nitrogen assimilatory genes 
such as that for glutamine synthetase (gtnA) or of the nif operon. - pgtA from Salmonella typhimurium, the activator of 

45 the inducible phospho- glycerate transport system. - pilR from Pseudomonas aeruginosa, an activator of pilin gene 
transcription. - rocR from Bacillus subtilis, an activator of genes for arginine utilization - tyrR from Escherichia coli, 
involved in the transcriptional regulation of aromatic amino-acid biosynthesis and transport. - wtsA, from Erwinia stew- 
artii. an activator of plant pathogenicity gene wtsB. - xylR from Pseudomonas putida. the activator of the tol plasmid 
xylene catabolism operon xylCAB and of xylS. - Escherichia coli hypothetical protein yfhA. - Escherichia coli hypothet- 

50 leal protein yhgB. About half of these proteins (algB, dcdT, flbD, hoxA, hupRI, hydG, ntrC, pgtA and pilR) belong to 
signal transduction two-component systems [3] and possess a domain that can be phosphorylated by a sensor-kinase 
protein in their N- terminal section. Almost all of these proteins possess a helix4um-helix DNA-binding domain in their 
C-terminal section. The domain which interacts with the sigma-54 factor has an ATPase activity. This may be required 
to promote a conformational change necessary for theinteraction [4]. The domain contains an atypical ATP-binding 

55 motif A (P-loop) as well as a form of motif B. The two ATP-binding motifs are located in the N-terminal section of the 
domain; signature patterns have been developed for both motifs. Other regions of the domain are also conserved. One 
of them, located in the C-terminal section, has been selected as a third signature pattem. 
Consensus pattem: (LIVMFY](3)-x-G-[DEQ]-[STEl-G-[STA\/]-G-K-x(2)-[LIVMFY] 
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Consensus pattern: [GSl.x-[UVMn-x(2)-A-[DNEQASH]-[GNEK]^ 
r^^^. r>oHor«- rPVWl-P-rnRl-N-rUVMl-R-rEQl-L-x-fNHAT] 

riTMo;;;ni:s~egovi3 Bact^rioi. 175;606>-6674(1993).[21 Austm S., ^-drot C Dixon 
19:2281-2287(1991).[3] Albright L.M.. Huala E.. Ausubel F.M. Annu. Rev. Genet. 23:311-336(1989).[4] Austin S., 
Dixon R. EMBO J. 11:2219-2228(1992). 

[15061 625. Sigma-70 factors family signatures dma „^i.™=r=«<. 

Sigma factors [ifare bacterial transcription initiation factors that promote the attachment of the core RNA Polymerase 
to initiation sites and arethen released. They alter the specificity of promoter recognition. Most bactena express 

a S Sgrrta factors. Two of these factors, sigma-70 (gene rpoD). generally known as the n?apr orpnma^ 

siS^a factor and slgma-54 (gene rpoN or ntrA) direct the transcription of a wide vanety of genes. The other sigma 
known as alternative slgmaTactors. are required for the transcription of specific subsets o genes. W.h regard 
to sequence similarity. Sigma factors can be grouped into two classes: the s.gma-54 and sigma-70 famih^^^^^^ 

70 family includes, in addition to the primary sigma factor, a wide variety of sigma factors, some of ^^^^^^J^^ 
below: - Bacillus sigma factors involved in the control of sporulation-specffic genes: s-gma-E (sigE 
F (siaF or spollAC). sigma-G (sigG or spolllG), sigma-H (sigH or spoOC) and sigma-K (sigK or spolVCB/spolllC). 
LSchia Srand rliBd bacteria sigma-32 (gene rpoH or htpR) involved in the expression of heat shock genes. - 
ISeS a S and ^iLd bacteria sigma-27 (gene fliA) involved in the expression of the flagellin gene. - Escher,ch« 
coSma-S (gene rpoS or katF) which seemstobe involved in the expression of genes required 
elrSstTesses. Myxococcus xanthus sigma-B (sigB) which is essential for the late-stage ^'fferen^atK^n of tha 
baaerfa Alignments of the sigma-70 family pemilt the Identification of four regions of high conservation [2.3] Jach of 
these ou7 egions can in turn be subdivided into a number of sub-regions. Signature patterns based on the two best- 
conse^ed Xegtons have been developed. The first pattern corresponds to sub-region 2.2;the exact function of this 
su^regton ts nXown although it could be involved in the binding of the sigma factor to the core RNA polymerase 
rhteS^Tp;ttlI;,^rrespondstosub-region4.2whichs^ 

:-rrre-eTtS;=^^^^^^^^^ 

fi^l!rJ-S:^Z... M.X Annu. Rev. Biochem. 57:839.7., 088,.1 2, Gnbskov ^^^^^^^ 
kcids Res. 14:6745-6763(1986).[3] Lonetto M.A.. Gribskov M., Gross f-^- J- Ba'Jen^l m^^^ 
Lonetto M.A., Brown K.L., Rudd K.E.. Buttner M.J. Proc. Natl. Acad. Sci. U.S.A. 91:7573-7577(1994). 
[1507] 626. Signal carboxyl-terminal domain. 430 members. 

grnSpeSasrct^sCmU^^^^^^ 

fnTrolr^'es^^^^^^^ types of Spases are known: type I (gene lepB) which is -P°-'^'^^ - "^^^^^^^^ ^e 
mjority of exported pre-proteins; type II (gene Isp) which only process lipoproteins, and a third type '"^o^^^ '^^ 
processing of pili subunits SPase I is an integral membrane protein that is anchored in the l 
one S B subtilis) or two (in E. coli) N-tenninal transmembrane domains with the main part of ttie protem protuding in 
ririplasmt spa<^. Two residues have been shown [2,3] to be essential for the catalytic activrty of SPase 1: a ^r^e 
a-^^n lS.SPase 1 is evolutionary related to the yeast mitochondrial inner membrane protease ^ubunrt 1 and 2 
?aenes MP1 and IMP2) which catalyze the removal of signal peptides required for the targeting of proteins from he 
SSondrial matrix, across the inner membrane, into the inter-membrane space [4]. In eukaryotes the removal of 
SX^des is effected by an oligomericenzymatic complex composed of at leastflvesubunrt^ 
complex (SPC) The SPC is located in the endoplasmic reticulum membrane. Two components of mammalian SPC 
STlS (SPC18) and the 21 Kd (SPC21) subunits as well as the yeast SEC11 subunit have been shown [5] to sh^ 
regions of sequence similarity with prokaryotic SPases 1 and yeast 1MP1/IMP2. Th-^e signature patterns for these 
SKs have beendeveloped The first signature contains the putative acliveslte serine, the second signaturecontains 

tC^aSve acL site Jne which is not conserved in the SPC subunits, and the 'j^^'^ ^^^f^^^^^'}^ = 
conirved region of unknown iological significance which is located in the C-termmal section of all these proteins. 
Consensus pattern: [GS]-x-S-M-x-pS]-[ATl-[LF] [8 is an active site residue] 

SJJ^nsus pattern: K-R-[LIVMSTAl(2)-G-x-[PG]-G-[DE]-x-[LlVM]-x-[LIVMFY] [K is an active site residue] 
Consensus pattern: ILIVMFYW](2)-x(2)-G-D-[NH]-x(3)-[SND]-x(2)-[SG] 

Ml Dalbey R E von Heijne G. Trends Biochem. Sd. 1 7:474-478(1 992).[ 2] Sung M.. Dalbey R E. J B.oL Chem. 267 
3 M ?3159(^992).l 3] Black M.T J. Bacteriol. 175:4957-4981(1993).t 4] Nunnari ^^^I'^-'^.f^^^^^^^^ 
1997-2004(1993).[ 5] van DijI J.M.. de Jong A.. Vehmaanpera J.. Venema G.. Bron S. EMBO J. 11.2819-2828(1992). 
[6] Rawlings N.D.. Barrett A.J. Meth. Enzymol. 244:19-61(1994).[E1] 
[1 509] 628. (sodcu) Copper/Zinc superoxide dismutase signatures 
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Copper/Zinc superoxide dismutase (SODC) [1] is one of the three forms of an enzyme that catalyzes the dismutation 
of superoxide radicals. SODC binds one atom each of zinc and copper. Various forms of SODC are known: acytoptasmtc 
form in eukaryotes. an additional chloroplastform in plants, an extracellular form in some eukaryotes. and a periplasmic 
form in prokaryotes. The metal binding sites are conserved in all the known SODC sequences [2]. Two signature 
5 patterns have been derived for this family of enzymes: the first one contains two histidine residues that bind the copper 
atom; the second one islocated in the C-terminal section of SODC and contains a cysteine which is involved in a 
disulfide bond. Consensus pattern: [GA]-[IMFAT]-H-[LIVF]-H-x(2)-[GP]-[SDG]-x-(STAGDE] [The two H's are copper 
ligands] 

Consensus pattem: G-[GN]-[SGA]-G-x-R-x-[SGA]-C-x(2HIV] [C is involved in a disulfide bond] 
10 1 1] Bannister J.V.. Bannister W.H., Rotilio G. CRC Crit. Rev Biochem. 22:1 11-1 54(1 987).[21 Smith M.W., Doolittle R. 
F. J. Mol. Evol. 34:175-184(1992). 

[1510] 629. (sodfe) Manganese and iron superoxide dismutases signature 

Manganese superoxide dismutase (SODM) [1] is one of the three forms of an enzyme that catalyzes the dismutation 
of superoxide radicals. The four ligands of the manganese atom are conserved in all the known SODM sequences. 
15 These metal ligands are also conserved in the related iron form ofsuperoxide dismutases [2,3]. A short conserved 
region which includes two of the four ligands: an aspartate and a histidine has been selected as a signature. 
Consensus pattem: D-x-W-E-H-[STA]-[FY](2) [D and H are manganese/iron ligands] 

[ 1] Bannister J.V.. Bannister W.H.. Rotilio G. CRC Crit. Rev Biochem. 22:1 11-1 54(1 987).[ 2] Paricer M.W., Blake C.C. 
F. FEBS Lett. 229:377-382(1 988). [ 3] Smith M.W.. Doolittle R.F. J. Mol. Evol. 34:175-184(1992). 
20 [1511] 630. Spectrin repeat 

[1 512] Spectrin repeats are found in several proteins involved in cytoskeletal structure. These include spectrin, alpha- 
actinin and dystrophin. The sequence repeat used in this family is taken from the structural repeat in reference [2]. The 
spectrin repeat forms a three helix bundle. The second helix is interrupted by proline in some sequences. 
Number of members: 898 

25 [1] Actin-binding proteins. 1 : Spectrin super family. Hartwig JH; Protein Profile 1995;2:732-732. [2] Crystal struc- 

ture of the repetitive segments of spectrin. Yan Y, Winograd E, Viel A, Cronin T, Harrison SC, Branton D; Science 1993; 
262:2027-2030. 

[1513] 631. (subtilase) Streptomyces subtilisin-type inhibitors signature 

Bacteria of the Streptomyces family produce a family of proteinase inhibitors[1] characterized by their strong activity 
30 toward subtilisin. They arecollectively known as SSI's: Streptomyces Subtilisin Inhibitors. Some SSI'salso inhibit trypsin 
or chymotrypsin. In their mature secreted form, SSI's areproteins of about 110 residues with two conserved disulfide 
bonds. + + + + nil 

xxxxxxxxxxxxxxCxxxxxxxCxxxxxxxxxCx#xxxxxxxxxxxxCxxxxxx************'C': conserved cysteine involved in a di- 
sulfide bond.*#*: active site residue.*4<': position of the pattem. 
35 Consensus pattem: C-x-P-x(2,3)-G-x-H-P-x(4)-A-C-[ATD]-x-L [The two C's are involved in a disulfide bond] 
[ 1] Taguchi S., Kojima S., Terabe M., Miura K.-l., Momose H. Eur. J. Biochem. 220:911-918(1994). 
[1514] 632. Sugar transport proteins signatures 

In mammalian cells the uptake of glucose is mediated by a family of closely related transport proteins which are called 
the glucose transporters [1,2,3].At least seven of these transporters are currently known to exist (in Human they are 

40 encoded by the GLUTI to GLUT7 genes).These integral membrane proteins are predicted to comprise twelve mem- 
brane spanning domains. The glucose transporters show sequence similarities [4,5] with a number of other sugar or 
metabolite transport proteins listed below (references are only provided for recently determined sequences). - Es- 
cherichia coli arabinose-proton symport (araE). - Escherichia coli galactose-proton symport (galP). - Escherichia coli 
and Klebsiella pneumoniae citrate-proton symport (also known as citrate utilization determinant) (gene cit). - Es- 

45 cherichia coli alpha-ketoglutarate permease (gene kgtP). - Escherichia coli proline/betaine transporter (gene proP) [6]. 
- Escherichia coli xylose-proton symport (xylE), - Zymomonas mobilis glucose facilitated diffusion protein (gene gif). - 
Yeast high and low affinity glucose transport proteins (genes SNF3, HXT1 to HXT14). - Yeast galactose transporter 
(gene GAL2). - Yeast maltose permeases (genes MAL3T and MAL6T). - Yeast myo-inositol transporters (genes ITR1 
and ITR2). - Yeast carboxylic acid transporter protein homolog JEN1. - Yeast inorganic phosphate transporter (gene 

50 PH084). - Kluyveromyces lactis lactose permease (gene LAC12). - Neurospora crassa quinate transporter (gene Qa- 
y), and Emericella nidulans quinate permease (gene qutD). - Chlorella hexose cannier (gene HUP1). - Arabidopsis 
thaliana glucose transporter (gene STP1). - Spinach sucrose transporter. - Leishmania donovani transporters D1 and 
D2. - Leishmania enriettii probable transport protein (LTP). - Yeast hypothetical proteins YBR241c, YCR98c and 
YFL040W. - Caenorhabditis elegans hypothetical protein ZK637.1. - Escherichia coli hypothetical proteins yabE, ydjE 

55 and yhjE. - Haemophilus influenzae hypothetical proteins HI0281 and HI0418. - Bacillus subtilis hypothetical proteins 
yxbC and yxdF. It has been suggested [4] that these transport proteins have evolved from theduplication of an ancestral 
protein with six transmembrane regions, this hypothesis is based on the conservation of two G-R-[KR] motifs. The first 
one is located between the second and third transmembrane domains and the second one between transmembrane 
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domains 8 and 9. Two patterns have been developed to detect this family of proteins The first Patteni is based on the 
o o r^o, ^.♦if- h.rt h»r««P this motif is too short to be specific to this family of proteins, a pattern from a larger region 
centered on"the second "copy of this motif was derived. The second pattern is based on a number of «>"servea resiaues 
wtlJS are boated at the end of the fourth transmembrane segment and in the short loop region between the fourth 

ConsenVutTatt?rn:[LlVMS 

Consensus pattem: [LIVMF]-x-G-[LIVMFA]-x{2)-G-x(8)-[LIFY]-x(2)-lEQ]-x(6)- [Riq 

Ml Silverman M Annu. Rev. Biochem. 60:757-794(1 991 ).[ 2] Gould G.W., Bell G.I. Trends Biochem. Sc.. 15^18-23 
lis TaTBaldw^^^^ Biochim. Biophys. Acta 11 54:1 7-49(1 993).[ 4, Maiden M.CJ.. Davis B^'^w- S_A^^^M^^^^^ 
D.C.M Henderson P.J.F. Nature 325:641-643(1987).[ 5] Henderson P.J.R Curr. Opm. Struct B.o^. l-^SOfO < 991); 
[ 6] Culham D.E., Lasby B.. Marangoni A.G.. Milner J.L.. Steer B.A.. van Nues R.W.. Wood J.M. J. Mol. B.ol. 229. 
268-276(1993). 

'XS^X^^'i^^^^'- P--- ^Vnaptic vesicles whose function is not yet «t 

whS, is highly conserved in mammals, electric ray (where its is known as VAMP-1), Drosoph.la y«««*P]_m yeast 
Tei are L closely related fomis of synaptobrevin (genes SNC1 andSNC2) while in mammals there is at least 4 
teenes J>Sl SYB2: SYB3 and SYBL1).Structurally synaptobrevin consist of a N-terminal cytoplasmic domain of from 
90 to IIO^s dues, followed by a transmembrane region, and then by a short (from 2 to 22 residues) C— ahn'ra- 
vLicular domain. Asasignature pattern for synaptobrevin.ahighly conserved stretch of residues located in the ^^^^ 

Ml SuShof T.C Baumert J.! Perin M.S.. Jahn R. Neuron 2:1475-1481 (1989).[ 2] Gerst J.E.. Rodgers L.. Riggs M., 
WialerM Proc Natl. Acad. Sci. U.S.A. 89:4338-4342(1992). 

JJs 6 Sa TBC domain. Identification of a TBC domain in GYP6_YEAST and GYP7_YEAST. whi«.h are GTPase 
Lctivator proteins of yeast Ypt6 and Ypt7. imply that these domains are GTPase activator proteins of Rab^ike small 
GTPases. Number of members: 55 

[11 Medline: 96032578. Molecular cloning of a cDNA with a novel domain present in the t;e-2 0"«^g^"« 
veast cell cycle regulators BUB2 and cdc16. Richardson PM, Zon LI; Oncogene 1995;11:1139-1148^ 
plMeSe: 97398935. A shared domain between a spindle assembly checkpoint protein and Ypt/Rab-specific 
GTPase-activators. Neuwald AF; Trends Biochem Sci 1997;22:243-244. 

115171 635. Transcription factor TFIID repeat signature (TBP) ... .K-»^iw=««„ 

Transcription factor TFIID (or TATA-binding protein. TBP) [1 .2] is a general factor that plays a major role in the activation 
I^eul^iSc genes franscribed by RNA polymerase II. TFIID binds specifically to the TATA box P^or.o^ere ea,en, 
wh^h E close to the position of transcription initiation. There is a remaricable degree of sequence conseiva^on of a 
C temninal domain of about 180 residues in TFIID from various eukaryotic sources This regK,n ^"^^^^^^^"^^"f. 
fic enHor TATA box binding. The most significant structural feature of this domain is the presence of two conserved 
JeTeats i a 77 amino-acid'egion. The intramolecular symmetry generates a saddle^haped ^^^^^^ *a»j;l^f 
the DNA [3]. Drosophila TRF (TBP-related factor) [4] is a sequence-specific transcnption factor that also bindte to the 
TATA box and is highly similar to TFIID. Archaebacteria also possess a TBP homolog [5]. A signature pattem that 
spans the last 50 residues of the repeated region has been denved.- , „ c ictmi r tkri ruvMl x 

consensus pattem: Y-x-P-x(2)-[IF]-x(2)-[LIVM](2)-x-[KRHl-x(3)-P-[RKQ]-x(3)- L-[L'VM]-F-x-[STN]-G-[K^^^^^^^^ 
r3VG-n-AGLl-rKRl-x(7)- [AGC]-x(7HLIVM [ 1] Hofftnann A., Sinn E.. Yamamoto T. Wang J.. Roy A.. tHonkoshi M.. 
Roede^R^' Sr?346:387-390{1990).[21 Gash A.. Hoffmann A.. Horikoshi M.. Roeder R.G Chua N.-K Nature 
^6-390-394(1990) [3] Nikolov D.B.. Hu S.-H.. Lin J.. Gasch A.. Hoffmann A.. Horikoshi M.. Chua N.-H., Roed^ R. 
S^B^'ri^e'sTSurJ^ 

557-561(1993).[5] Marsh TL.. Reich C.I., Whitelock R.B., Olsen G.J. Proc. Natl. Acad. Sci. U.S.A. 91.4180-4184 
(1994). 

[15181 636. Translationally controlled tumor protein signatures (TCTP) ....... ^ , ^ h» r,rofar<.n 

Mamrialian franslationally controlled tumor protein (TCTP) (or P23) is a protein which has been found to be preferen- 
tTallTTynthesized in cells during the eariy growth phase of some types of tumor H ^ . but which is also -pressed in 
normal cells The physiological function of TCTP is still not known. It is a hydroph.lic protein of 18 to 20 Kd. Close 
hrcSogs haveleen'found in plants [3]. earthwom, [4], Caenorhabditis eiegans <'^f "^^l^)' "Vj^^^J^^^^^^^^ 
(YKL056C) [5] and fission yeast (SpAC1F12.02c) Two of the best conserved regions have been selected as signature 

patterns for TCTP. ^„ ..^..^^ « , 

Consensus pattem: [IFA]-[GAHGAS1-N-IPAI<]-S-[GA]-E-[GDE]-PAG^-IDEQGA 
Consensus pattem: [FLVH]-[FY]-[IVCTl-G-E-x-[MAl-x(2.5HDENh[GAST]-x-[LV]-[AV]-x(3)-[FYWl 
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[ 1] Boehm H.. Beendorf R., Gaestel M., Gross B.. Nuernberg R, Kraft R., Otto A., Bielka H. Biochem. Int. 19:277-286 
(1989). [ 2] Makrides S., Chltpatima ST.. Bandyopadhyay R., Brawerman G. Nucleic Acids Res. 16:2350-2350(1988). 
[3] Pay A., Heberle-Bors E., Hirt H. Plant Mol. Biol. 19:501-503(1992).[4] Stuerzenbaum S.R.. Kille P.. Morgan A.J. 
Biochim. Biophys. Acta 1 398:294-304(1 998). [ 5] Rasmussen S.W. Yeast 10:S63-S68(1994). 

5 [1 519] 637. TFIIS zinc ribbon domain signature 

Transcription factor S-ll (TFIIS) [1] is a eukaryotic protein necessary for efficient RNA polymerase II transcription elon- 
gation, past template-encoded pause sites. TFIIS shows DNA-binding activity only in the presence of RNA polymerase 
II. It is a protein of about 300 amino acids whose sequence is highly consers/ed in mammals. Drosophila. yeast (where 
it was first known as PPR2. a transcriptional regulator of URA4, and then as DST1, the DNA strand transfer protein 

10 alpha [2]) and in the archaebacteria Sulfolobus acidocaldarius [3].This family also includes the eukaryotic and arche- 
bacterial RNA polymerase subunits of the 15 Kd / M family (see <PDOC00790 >) as well as the following viral proteins: 
- Vaccinia virus RNA polymerase 30 Kd subunit (rpo30) [4]. - African swine fever virus protein I243L [5].The best 
conserved region of all these proteins contains four cysteines that bind a zinc ion and fold in a conformation termed a 
*zinc ribbon' [6J. Besides these cysteines, there are a number of other conserved residues which can be used to help 

15 define a specific pattern for this type of domain. 

Consensus pattern: C-x(2)-C-x(9)-[LIVMQSAR]-[QH]-[STQL]-IRA]-[SACR]-x-[DE]-[DET]-[PGSEA]-x(6)-C-x(2,5)-C-x 
(3)-[FW] [The four C*s are zinc ligands] 

[1] Hirashima S., Hirai H.. Nakanishi Y, Natori S. J. Biol. Chem. 263:3858-3863(1 988). [ 263:3858-3863(1 988). [ 2] Ki- 
pling D., Kearsey S.E. Nature 353:509-509(1 991 ).[ 3] Langer D.. Zillig W. Nucleic Acids Res. 21 :2251-2251(1 993).[ 4] 
20 Ahn B,-Y., Gershon P.O.. Jones E.V., Moss B. Mol. Cell. Biol. 10:5433-5441 (1990). [5] Rodriguez J.M.. Salas M.L.. 
Vinuela E. Virology 186:40-52(1992).[ 6] Qian X., Jeon C, Yoon H., Agarwal K., Weiss M.A. Nature 365:277-279(1993). 
[1520] 638. Tetrahydrofolate dehydrogenase/cyclohydrolase signatures (THF DHG CYH) 

Enzymes that participate in the transfer of one-carbon units are involved in various biosynthetic pathways. In many of 
these processes the transfers of one-carbon units are mediated by the coenzyme tetrahydrofolate (THF). Various 

25 reactions generate one-carbon derivatives of THF which can be interconverted between different oxidation states by 
formyltetrahydrofolate synthetase(EC 6.3.4.3 ), methylenetetrahydrofolate dehydrogenase (EC 1.5.1.5 or EC 1.5.1.15) 
and methenyltetrahydrofolate cyclohydrolase (EC 3.5.4.9 ).The dehydrogenase and cyclohydrolase activities are ex- 
pressed by a variety of multifunctional enzymes: - Eukaryotic C-1 -tetrahydrofolate synthase (CI -THF synthase), which 
catalyzes all three reactions described above. Two forms of C1-THF synthases are known [1], one is located in the 

30 mitochondrial matrix, while the second one is cytoplasmic. In both forms the dehydrogenase/cyclohydrolase domain 
is located in the N-termtnal section of the 900 amino acids protein and consists of about 300 amino acid residues. The 
C1-THF synthases are NADP- dependent, - Eukaryotic mitochondrial bifunctional dehydrogenase/cyclohydrolase [2]. 
This is an homodimeric NAD-dependent enzyme of about 300 amino acid residues. - Bacterial folD [3]. FolD is an 
homodimeric bifunctional NADP-dependent enzyme of about 290 amino acid residues. The sequence of the dehydro- 

35 genase/cyclohydrolase domain is highly conserved in all forms of the enzyme. Two conserved regions have been 
selected as signature pattems. The first one is located in the N-terminal part of these enzymes and contains three 
acidic residues. The second pattem is a highly conserved sequence of 9 amino acids which is located in the C-terminal 
section. 

Consensus pattern: [EQ]-x-[EQK)-[LIVM](2)-x(2)-[LIVM]-x(2)-[LIVMY]-N-x-[DN]-x(5)-[LIVMF](3)-Q-L-P-[LV] 
40 Consensus pattem: P-G-G-V-G-P-[MF]-T-[IV] 

[ 1] Shannon K.W.. Rabinowitz J.C. J. Biol. Chem. 263:771 7-7725(1 988). [ 2] Belanger C, Mackenzie R.E. J. Biol. 
Chem. 264:4837-4843(1 989).[ 3] d*Ari L.. Rabinowitz J.C. J. Biol. Chem. 266:23953-23958(1991). 
[1521] 639. Triosephosphate isomerase active site (TIM) 

Triose phosphate isomerase (EC 5.3.1.1 ) (TIM) [1] is the glycolytic enzyme that catalyzes the reversible interconversion 
45 of glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. TIM plays an important role in several metabolic 
pathways and is essential for efficient energy production. It is a dimer of identical subunits, each of which Is made up 
of about 250 amino-acid residues. A glutamic acid residue is involved in the catalytic mechanism [2]. The sequence 
around the active site residue is perfectly conserved in all known TIM's and can be used as a signature pattem for this 
type of enzyme. 

50 Consensus pattem: [AV]-Y-E-P-[LIVM]-W-[SA]-I-G-T-[GK] [E is the active site residue] 

[ 1] Lolis E.. Alber T., Davenport R.C., Rose D.. Hartman F.C., Petsko G.A. Biochemistry 29:6609-661 8(1990).[ 2] 

Knowles J.R. Nature 350:121-124(1991). 

[1522] 640. Thymidine kinase cellular-type signature (TK) 

Thymidine kinase (TK) (EC 2.7.1.21) is an ubiquitous enzyme that catalyzes the ATP-dependent phosphorylation of 
55 thymidine. A comparison of TK sequences has shown [1,2,3] that there are two different families of TK. One family 
groups together TK from herpes viruses as well as cellular thymidylate kinases, while the second family currently 
consists of TK from the following sources: - Vertebrates. - Bacterial. - Bacteriophage T4. - Pox viruses. - African swine 
fever virus (ASF). - Fish lymphocystis disease virus {FLDV).A conserved region which is located in the C-terminal 
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16:1l'303-11317(1988). 

[1 523] 641 . Thymidine kinase from herpesvirus (TK herpes) 
[1] 

cSlTsSIsof the mymidine Kinase from herpes simp.ex virus type-1 in comp.ex with deoxythymidine and gan- 

ciclovir 

Brown DG. Visse R. Sandhu G. Davies A. Rizkallah PJ. Melitz 
C, Summers WC, Sanderson MR; 
Nat Struct Biol 1995;2:876-881. 
Number of members: 65 

Sems r?S2 have been developed: one located in the N-temiinal domain, the other ,n the Cenr-nai. 
Consensus pattern: H-x(3)-H-S-lNS]-S-x-P-Q-S 

t>ronLseen^mes are structurally related. TheserelatedTT^^ 

Reaction catalyzed: pyruvate . orthophosphate . 0(2) . H(2)0 =^^^^^^^^^0(2^^^^^ deLrbox- 

decart,oxy.ase (PDC) (EC 4JJ^ Reac^on cata^^^^^^^^ 

ylase(EC4JJ^)[2]ReactioncatalyzJ:.ndo^^^^^^ 

Lwt^^ai section has been selected as a signature P^tlem for these 

[1 526] 644. TPR Domain 
[1] 

Medline: 9539741 5 

Tetratrico peptide repeat interactions: to TPR or not to TPR? 
Lamb JR, Tugendreich S, Hieter P; 
Trends Biochem Sci 1995;20:257-259. 

TheTuSrofml'tlSricopeptide repeats of p 
interactions. 

Das AK, Cohen PW. Barford D; 

EMBO J 1998;17:1192-1199. 
Number of members: 621 

,,5271 645. Uwn*ynWII f'^JS^jSli^ri^SSt. ,r.n.f, ofK» m«h/ 9 W S- 
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Pseudomonas denitrificans (cobA) or Methanobacterium ivanovii (gene corA) SUMT Is a protein of about 25 to 30 Kd. 
In Escherichia coU and related bacteria, the cysG protein, which is involved in the biosynthesis of sirohenne, is a mul- 
tifunctional protein composed of a N-terminal domain, probably involved in transforming precorrirh2 into siroheme. and 
a C-temninal domain which has SUMT activity. The sequence of SUMT is related to that of a number of P. denitrificans 

s and Salmonella typhimurium enzymes involved in the biosynthesis of cobatamin which also seem to be SAM-dependent 
methyltransf erases [3,4]. The similarity is especially strong with two of these enzymes: cobl/cbiL which encodes S- 
adenosyl-L-methionine— precorrin-2 methy transferase and cobM/cbiF whose exact function is not known. Two signa- 
ture patterns have been developed for these enzymes. The first corresponds to a well conserved region in the N- 
terminal extremity (called region 1 in [1,3]) and the second to a less conserved region located in the central part of 

10 these proteins (this pattem spans what are called regions 2 and 3 in [1 ,3]). 

Consensus pattem: [LIVMl-[GS]-[STAL]-G-P-G-x(3)-[LIVMFY]-[LIVM]-T-[LIVM]-[KRHQG]-[AG] 

Consensus pattem: V-x{2HLI]-x(2)-G-D-x(3)-[FYW]-[GS]-x(8)-[LIVF]-x(5,6)-[LIVMFYWPAC]-x-[LIVMY]-x-P-G 

[ 1] Blanche R, Robin C, Couder M., Faucher D., Cauchois L., Cameron B., Crouzet J. J. Bacteriol. 173:4637-4645 

(1991).[ 2] Robin C, Blanche R, Cauchois L., Cameron B., Couder M.. Crouzet J. J. Bacteriol. 173:4893-4896(1991). 

15 [ 3] Crouzet J., Cameron B., Cauchois L., Rigault S., Rouyez M.-C., Blanche R. Thibaut D.. Debussche L. J. Bacteriol. 
1 72:5980-5990(1 990).[ 4] Roth J.R., Lawrence J.G., Rubenfield M., Kieffer-Higgins S., Church G.M. J. Bacteriol. 175: 
3303-3316(1993).[ 5] Mattheakis L.C., Shen W.H., Collier R.J. Mol. Cell. Biol. 12:4026-4037(1992). 
[1 528] 646. Tudor domain 

Domain of unknown function present in several RNA-binding proteins, copies in the Drosophila Tudor protein. Slight 
20 ambiguities in the alignment.Number of members: 18 

[1]Medline: 97200561 Tudor domains in proteins that interact with RNA. Ponting CP; Trends Biochem Sci 1997;22: 

51-52. [2]Medllne: 97157029 The human EBNA-2 coactlvator pi 00: multldomain organization and relationship to the 

staphylococcal nuclease fold and to the tudor protein involved in Drosophila melanogaster development. Callebaut I. 

Mornon JP; Biochem J 1997;321:125-132. 
25 [1529] 647. Terpene synthase family 

It has been suggested that this gene family be designated tps (for terpene synthase) [1]. It has been split into six 

subgroups on the basis of phylogeny, called tpsa-tpsf. tpsa includes vetispiridiene synthase Swiss:Q39979. 5-epi- 

aristolochene synthase, Swiss:Q40577 and ('f-)*delta-cadinene synthase Swiss:P93665. 

tpsb includes (-)-limonene synthase, Swlss:Q40322. 
30 tpsc includes kaurene synthase A, Swiss:004408. 

tpsd includes taxadlene synthase. Swiss:Q41594. pinene synthase, 

Swtss:024475 and myrcene synthase, Swiss:024474. 

tpse includes kaurene synthase B. 

tpsf includes linalool synthase. 
35 Number of members: 51 
[1] 

Medline: 97413772 

Monoterpene synthases from grand fir (Abies grandls). cDNA Isolation, characterization, and functional expression of 
myrcene synthase, (-)-(4S)-limonene synthase, and (-)-(1S,5S)-pinene synthase. 
40 Bohlmann J, Steele CL. Croteau R; 

J Biol Chem 1997;272:21784-21792. 
[1530] 648. ThiF family 

This family contains a repeated domain in ubiquttin activating enzyme El and members of the bacterial 
ThiF/MoeB/HesA family.N umber of members: 87 
45 [1531] 649. Thloester dehydrase 

Members of this family are involved in fatty acid biosynthesis. 
Number of members: 19 
[1] 

Medline: 96398612 

50 Structure of a dehydratase-isomerase from the bacterial pathway for biosynthesis of unsaturated fatty acids: two cat- 
alytic activities in one active site. 
Leesong M, Henderson BS, Gillig JR, Schwab JM, Smith JL; 
Structure 1996;4:253-264. 
Database Reference: SCOP; Imka; fa; [SCOP-USA] [CATH-PDBSUM] 
55 Database reference: PFAMB; PB058036; 
[1 532] 650. Tub femily signatures 

The mouse tubby mutation is the cause of maturity-onset obesity, insulin resistance and sensory deficits. This mutation 
maps to a gene, tub [1,2],which codes for a protein that belongs to a family which currently consists of the following 
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members: - Mammalian tub. an hydrophilic protein of about 500 residues, wh.ch could be involved .n '^e hypoU^^am^ 
.......... «f hnH„ wainht - Human orotein TULP1 13] which may be Involved in retints pigmentosa 14, a retinal de- 

oTnerationdise'a'^' - Mouse protein ^-6 whose function Is not known. - Caenorhabditis elegans hypotneticai protein 
F10B?r- Se'l fragmentary sequences from plants. Drosophila and human ESTs. While the N-terminal part of 
these protein Is not conserved in length nor in the sequence, the C-terminal 250 residues are '''^'^'y . {Jfr 

fore two regions were selected in the C-temiinal part as signature patterns. The secondr egion is 'o^^d at the^ 
ter^in^l extremity and contains a penultimate cysteine residue that could be critical to the nomial functioning of these 
proteins. 

Consensus pattern: F-[KHQ]-G-R-V-[STl-x-A-S-V-K-N-F-Q 

Consensus pattern: A-F-IAGl-l-[SACHUVM]-[ST|-S-F-x-[GST]-K-x-A-C-E Mnimnrpn L Charlat 

[ 11 Kleyn P W.. Fan W.. Kovats S.G.. Lee J.L.. Pulido J.C.. Wu Y.. Berkemeier L.R.. Misum, D.J Holmgren U Charlat 
O ' Woolf E.A. Tayber O., Brody T.. Shu P.. Hawkins F.. Kennedy B.. ^^'f L.. Ebehng C AMpenn G^D.^^^^^^^^^ 
Lakev N D Culpepper J Chen H., Gluecksmann-Kuis M.A., Carlson G.A.. Duyk G.M., Moore K.J. Ce'l 85.281- 290 
^996).^ 2?N0ben!T7auth K.. Naggert J.K., North M.A.. Nishina P.M. Nature 380:534-538(1996^^^^ 3] North M.A.. Naggert 
J.K.. Yan Y., Noben-Trauth K., Nishina RM. Proc. Natl. Acad. Sci. U.S.A. 94:3128-3133(1997). 
M5331 651 Eukarvotic DNA topoisomerase I active site 

topoisomer^se UEC 5.99.1 2) [1 .2.3,4.I1J is one of the two types of enzyme that catalyze the interconversion 
XpoSSrONA feomersTlv^opoisom^ses act by catalyzing the transient breakage of DNA, one strand at a 
SmtCtS subsequent rejoL'ing of the strands. When a eukaryotic type .topoisomerase breaks -^^^^--^^^ne 
t^Tnd it simultaneously fomis a protein-DNA link where the hydroxyl group of a tyrosine residue is JO'^^d ° a 3 - 
phosphate on DNA, at one end of the enzyme-severed DNA strand. In eukaryotes and pox virus topoisomerases 1. 
there are a number of conserved residues in the region around the active site tyrosine. 

Consensus pattern: lDEN]-x(6)-[GS]-[lTl-S-K-x(2)-Y-[LlVM]-x(3)-[LIVM] is the Jl^^^'"^! 5.39^7 

[ 1] Sternglanz R. Curr. Opin. Cell Biol. 1:533-535(1 990).[2] Shamia A.. Mondragon A" C""; OfT.f ."f^^^ 
(1995).[3] Lynn R.M.. Bjomsti M.-A.. Caron P.R.. Wang J.C. Proc. Natl. Acad. Sci. U.S.A. 86.3559-3563(1989).! 4] 
Roca J. Trends Biochem. Sci. 20:156-160(1995).[E1] 

'jrTLZJl^C^S^X^T^e reversible transfer of a three-carbonketol unit from sedoheptulose T-phos- 
JhatetoXerShl^^ 

SfranJketolase. provides a link between the glycolytic and pentose^,hosphate P«*««f ' ^^^^^'f '^^ Jf^^" 
Sme o?about 34 kd whose sequence has been well conserved throughout evolution A lysine has '"^^^^ 
S the catalytic mechanism of the enzyme; it acts as a nucleophilic group that attacks the carbony group of foictos^ 
ShTsph'te^insaldol^^^ 

cSn whose exact function is not yet known. Two signature patterns have been developed for '^^^ P'^^'^ '^^J^- 
Saied Tn the N-temiinal section, contains a perfectly conserved pentapeptide: these cond. includes the active site 

lysine. 

I'i Mllig'*! SZSL.^« ... Fr.*.n E.. Zi™»r™nn F.K. Y»S. »,24,-1249(,993,.,2, ^ J., 
Reizer A.. SaierM.H. Jr. Microbiology 141:961-971(1995). 
11535] 653. (Transpeptidase) Penicillin binding protein transpeptidase domain 
1536 The active site serine (rosWue 337 in Swiss:P14677) is conserved in all 
[1 537] [1] Pares S. Mouz N. Petillot Y. Hakenbeck R. Dideberg O Nat Struct Biol 1 996;3:284-289. 

?rfhlseTEcT2ll8?is r^riyme responsible for the degradation of the disaccharide alpha, alpha-trehalose 
SdS 9^i^b-«^ m. .tTs an enzyme found in a wide variety of organisms whose se^"-- 
hiahlv conserved throughout evolution. Two of the most highly conserved regions have been selected as signature 
SSllr-^e St pattern is located in the central section, the second one Is in the C-terminal regK,n. Consensus 
pattern: P-G-G-R-F-x-E-x-Y-x-W-D-x-Y 

Consensus pattern: Q-W-D-x-P-x-[GA]-W-[PAS]-P ov.^k^«, i oq-^- 

[11 Kopp M Mueller H.. Holzer H. J. Biol. Chem. 268:4766^774(1 993).[ 2] Hennssat B. Ba.roch A. B.ochem. J. 293. 

781-788(1993).[E1] 

ri5391 655. Trehalose-6-phosphate synthase domain ^ , u i«*..=^«h«e 

[1 5401 OtsA (Trehalose-6-phosphate synthase) is homologous to regions in the subunits of yeast trehalose^phos- 
phate synthase/phosphate complex, [1]. 

[1541] [1] Kaasen I. McDougall J. Strom AR; Gene 1994;145:9-15, 
[1 542] 656. Tropomyosins signature 
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Tropomyosins [1,2] are family of closely related proteins present in muscle and nonnnuscle cells. In striated muscle, 
tropomyosin mediate the interactions between the troponin complex and actin so as to regulate muscle contraction. 
The role of tropomyosin In smooth muscle and non-muscle tissues is not dear. Tropomyosin is an alpha-helical protein 
that forms a coiled-coil dtmer. Muscle isoforms of tropomyosin are characterized by having 284 amino acid residues 
5 and a highly conserved N-terminal region, whereas non-muscte forms are generally smaller and are heterogeneous 
in their N-terminal region. The signature pattern for tropomyosins is based on a very conserved region in the C-terminal 
section of tropomyosins and which is present in both muscle and non-muscle forms. 
Consensus pattern: L-K-E-A-E-x-R-A-E 

[ 1] Smilie L.B. Trends Biochem. Sci. 4:151-155(1979).[ 2] McLeod A.R. BioEssays 6:208-212(1986). 
10 [1543] 657. Troponin 

Troponin (Tn) contains three subunits. Ca2+ binding (TnC). inhibitory (Tnl), and tropomyosin binding (TnT). this Pfam 
contains members of the TnT subunit. 

Troponin is a complex of three proteins, Ca2+ binding (TnC), inhibitory (Tnl). and tropomyosin binding (TnT). 
The troponin complex regulates Ca++ induced muscle contraction. 
IS This family includes troponin T and troponin t. Troponin I binds to actin and troponin T binds to tropomyosin. 
Number of members: 81 [1] 
Medline: 87144593 

Structure of co-crystals of tropomyosin and troponin. 
White SP. Cohen C. Phillips GN Jr; 
20 Nature 1987;325:826-828. [2] 
Medline: 95155315 

A direct regulatory rote for troponin T and a dual role for 

troponin C in the Ca2+ regulation of muscle contraction. 

Potter JD, Sheng Z, Pan BS, Zhao J; 
25 J Biol Chem 1995;270:2557-2562. 

[3]Medltne: 95324796 

The troponin complex and regulation of muscle contraction. 

Farah CS, Reinach FC; 

FASEB J 1995;9:755-767. 
30 [1544] 658. (Tryp mucin) Mucin-like glycoprotein 

[1 545] This family of trypanosomal proteins resemble vertebrate mucins. The protein consists of three regions. The 

N and C terminii are conserved between all members of the family, whereas the central region is not well conserved 

and contains a large number of threonine residues which can be glycosylated [1]. 

Indirect evidence suggested that these genes might encode the core protein of parasite mucins, glycoproteins that 
35 were proposed to be involved in the interaction with, and invasion of. mammalian host cells. 

[1] Di Noia JM, Sanchez DO, Frasch AC; J Biol Chem 1995;270:24146-24149. 

[2] Di Noia JM. D'Orso I. Aslund L. Sanchez DO. Frasch AC; J Biol Chem 1998;273:10843-10850. 

40 [1546] 659. Aminoacyl-transfer RNA synthetases class-l signature (tRNA synt 1) 

Amtnoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino acids and transfer them 
to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are at least twenty 
different types of aminoacyl-tRNA synthetases, one for each differentamino acid. In eukaryotes there are generally 
two aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a mitochondrial form. While all 

45 these enzymes have a common function, they are widely diverse interms of subunit size and of quatemary structure. 
A few years ago it was found [2] that several aminoacyl-tRNA synthetases share a region of similarity in their N-terminal 
section, in particular the consensus tetrapeptide His-lle-Gly-His ('HIGH) is very well conserved. The 'HIGH* region has 
been shown [3] to be part of the adenylate binding site. The 'HIGH' signature has been found in the aminoacyl-tRNA 
synthetases specific for arginine, cysteine, glutamic acid, glutamine. isoleucine, leucine, methionine, tyrosine, tryp- 

50 tophan. and valine. These aminoacyl-tRNA synthetases are referred to as ciass-l synthetases [4,5,6] and seem to 
share the same tertiarystructure based on a Rossmann fold. Consensus pattern: P-x(0,2)-[GSTAN]-[DENQGAPK]-x- 
[LIVMFPJ-[HT]-[LIVMYAC]-G-[HNTG]-[LIVMFYSTAGPC] 

[ 1] Schimmel P. Annu. Rev. Biochem. 56:125-1 58(1 987). [ 2] Webster T, Tsai H., Kula M., Mackie G.A.. Schimmel P. 
Science 226:1315-1317(1984).[ 3] Brick P, Bhat TN., Blow D.M. J. Mol. Biol. 208:83-98(1 988).( 4] Delanje M., Moras 
55 D. BioEssays 1 5:675-687(1 993).[ 5] Schimmel P Trends Biochem. Sci. 16:1-3(1991).[ 6] Nagel G.M.. Doolittle R.F. 
Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). 

[1 547] 660. Aminoacyl-transfer RNA synthetases class-l signature (tRNA synt 1 b) 

AminoacyMRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino acids and transfer them 
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to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms ^^^f^^^'J^ll^^^ 
of =.^inn«cvrt-tRNA synthetases, one for each different amino acid. In eukaryotes there are generally 
;;;;oTm^o;cyl-tRNA synthetases fo^ each different amino acid: one cytosolic forrn and a -^'t^^^;"^;^^^^^^ 
these enzymes have a common function, they are widely diverse in temis of subunit size and <^"^»7f 
AfewyeaLgoit was found [2] that several aminoacyl-tRNA synthetases shareareg-on of s.n^^^^^ 

section, in particular the consensus tetrapeptide His-lle-Gly-Hls CHIGH') is very well conserved The 'H^H^a'^^^^^^^^ 
been shown [3] to be part of the adenylate binding site. The 'HIGH' signature has been found .n the ^'""^^^^-'RNA 
synthetases specific forarginine. cysteine, glutamic acid, glutamine. isoleucine. leucine^ ^ s^em to 

tophan. and valine. These aminoacyMRNA synthetases are referred to as class-^ 

share the same tertiary structure based on a Rossmann fold. Consensus pattern: P-x(0.2HGSTAN]-[DENQGAPKl x 

ITsThX^jri'S^^^^^ Tsal H., Kula M.. Mackie G.A.. Schimmel P. 

Sc ence 2^6 315 1317(1984).[ 3] Brick P., Bhat TN.. Blow D.M. J. Mol. Biol. 208:83-98(1 988). [4] De arue M Moras 

□ ISssays 15 671687(199^^^ Schimmel R Trends Biochem. Sci. 16:1-3(1991).! 6] Nagel G.M.. Doohttle R.F. 

Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). 

[1 548] 661 . (tRNA-synt 1 C) tRNA synthetases class 1 (E and Q) 

[1 549] Other tRNA synthetase sub-families are too dissimilar to be included. 

This family includes only glutamyl and glutaminyl tRNA synthetases. «oMA^r i„^ 

In some organisms, a single glutamyl-tRNA synthetase aminoacylates both tRNA(Glu) and tRNA(Gln). 

[1550] [1] Rath VL. Silvian LF. Beijer B. Sproat BS. Steitz TA; Structure 1998;6:439-449. 

[1 551 ] 662. (tRNA-synt 1 d) tRNA synthetases class I (R) 

[1 552] Other tRNA synthetase sub-families are too dissimilar to be included. 

This family includes only arginyl tRNA synthetase. 

[1553] 663. Aminoacyl-transfer RNA synthetases class-ll signatures (tRNA synt 2) . . „.fe,,K»m 

Xminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate ammo acds and trajfe^^'^'^ 
to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are at least twenty 
SffeTnftypes of aminoacyMRNA synthetases, one for each different amino acid. In eukaryotes there are gene^aHy 
two aminoacyMRNA synthetases for each different amino acid: one cytosolic fom, and a mitochondr al fomi. White a^l 
Tse enzymes have a common function, they are widely diverse intemis of subunit size and o <^^'^^'^f;r,^^^Zl 
The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine. lysine. Phenylalanine, proline, senne 
and threonine are referred to as class-ll synthetases [2 to 6] and probably have a common folding PaWern in ttieir 
^t^^Z^ior the binding of ATP and amino acid which is different to the Rossmann '•''^ ^'^^-f^";^^^^^^^ 
I synthetases [7].CIass-ll tRNA synthetases do not share a high degree of similanty, however at least three conserved 
regions are present [2.5.8]. Signature pattems have been derived from two of these regions. 
Consensus oattem- rFYH]-R-x-[DE]-x(4. 1 2)-[RH]-x(3)-F-x(3)-[DE 

SJ'ns" SSm: GSTkLVnWQHRWlGSTA]-[L.VMF]-[DEl-R-[LIVMF]-x-[UVMS^^^^ 
[ 1] Schimmel R Annu. Rev Biochem. 56:125-158(1987).[2] Delarue M.. ^o^as D^BioEssays 15^675^687(1993).[ 
Schimmel P Trends Biochem. Sci. 16:1-3(1991).t4] Nagel G.M.. Doolittle R.F. Proc. Natl. Acad, ff^ U.S.A. 88. 
!l2l^T25(^ 99T). [ 5] Cusack S., Haertlein M., Lebemian R. Nucleic Acids Res. 19:3489-3498(1991).! 6] Cusack S^ 
Biochimie 75:1077-1081 (1993).[ 7] Cusack S.. Berthet-Colominas C. Haertlein M ^assar N. Lebeonan R Nature 
347-249-255(1990).[ 8] Leveque F.. Plateau P., Dessen P. Blanquet S. Nucleic Acids Res. 18:305-312(1990). 
ri554] 664. Aminoacy^tra^sfer RNA synthetases class-l signature (tRNA synt 1e) , »k „ 

AminoacyMRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino acids and transferthem 
to specific tRNA rr^lecules as the first step in protein biosynthesis. In prokaryotic organisms '^^^l^^^:^/* J^* « 
different types of aminoacyl-tRNA synthetases, one for each different ammo acid. In eukaryotes there are generaHy 
two aminoacyMRNA synthetases for each drtferent amino acid: one cytosolic torn, and a "^''^^^'^''^^'^^ ^Ire 
these enzymes have a common function, they are widely diverse in temis of subunit size and of « ^tm^u^e 
A few yeaLgo it was found [2] that several aminoacyl-tRNA synthetases share a region of similanty n^^^^ 
seLn. in particular the consensus tetrapeptideHis-lle-Gly-His ('HIGH') is very well conserved^^ 
been shown [3] to be part of the adenylate binding site. The 'HIGH' signature has been found in »t^.« f "^'"^^^'-^^^A 
synthetases specific forarginine. cysteine, glutamic acid, glutamine. isoleucine. leucine T rT. Ritnd seem to 
tophan. and valine. These aminoacyl-tRNA synthetases are referred to as class-l synthetases [4.5.6] and seem to 
eharo thp <same tertlarv structure based on a Rossmann fold. 
foninsusp^tte^PxS^^ 

M] ISmefp Annu. Rev. Biochem'. 56:125-158(1987).[ 2] Webster T. Tsai H.. Kula M. Madde G.A.. Schimmel R 
Science 226-1315-1317(1984).[ 3] Brick P. BhatTN., Blow D.M. J. Mol. Biol. 208:83-98(1 988).[ 4] Delanie M Moras 
a BioEssays iSL^^^ Schimmel P. Trends Biochem. Sci. 16:1-3(1991).[6] Nagel G.M.. Doolittle R.R 

Proc. Natl. Acad. Sci. U.SA. 88:8121-8125(1991). 
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[1555] 665. Aminoacyl-transfer RNA synthetases class-ll signatures (tRNA synt 2b) 

Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino acids and transfer them 
to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are at least twenty 
different types of aminoacyl-tRNA synthetases, one for each different amino add. In eukaryotes there are generally 

5 two aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a mitochondrial form. While all 
these enzymes have a common function, they are widely diverse interms of subunit size and of quaternary structure. 
The synthetases specific for alanine, asparagine, asparticacid, glycine, histidine, lysine, phenylalanine, proline, serine, 
and threonine are referred to as class-ll synthetases [2 to 6] and probably have a common folding pattern in their 
catalytic domain for the binding of ATP and amino acid which is different to the Rossmann fold observed for the class 

10 I synthetases [7].CIass-tl tRNA synthetases do not share a high degree of similarity, however at least three conserved 
regions are present [2.5.8]. Signature patterns have been derived from two of these regions. 
Consensus pattem: [FYHl.R-x-[DE]-x(4,12)-[RH]-x(3)-F.x(3)-[DE 

Consensus pattem: [GSTALVF]-{DENQHRKPHGSTAHLIVMF]-[DE)-R-[LIVMFl-x-IL!VMSTAG]-[LIVMFY] 

[ 1] Schimmel P. Annu. Rev Biochem. 56:125-158(1987).( 2] Delarue M., Moras D. BioEssays 15:675-687(1 993). [ 3] 

15 Schimmel P. Trends Biochem. Sci. 16:1-3(1991).[ 4] Nagel G.M.. Doolittle R.F. Proc. Natl. Acad, Sci. U.S.A. 88: 
8121-8125(1991). [ 5] Cusack S., Haertlein M.. Lebemnan R. Nucleic Acids Res. 19:3489-3498(1991).[ 6] Cusack S. 
Biochimie 75: 1077-1 081(1 993). [ 7] Cusack S.. Berthet-Colominas C, Haertlein M., Nassar N., Leberman R. Nature 
347:249-255(1990). [8] Leveque F., Plateau P., Dessen P.. Blanquet S. Nucleic Acids Res. 18:305-312(1990). 
[1556] 666. Thaumatin family signature 

20 Thaumatin [1] is an intensively sweet-tasting protein (100 000 times sweeter than sucrose on a molar basis) from 
Thaumatococcus daniellii, an African brush. The protein is made of about 200 residues and contains 8 disulfide bonds. 
A number of proteins have been found to be related to thaumatins. These protein are listed below (references are only 
provided for recently determined sequences). - A maize alpha-amylase/trypsin inhibitor. - Two tobacco pathogenesis- 
related proteins: PR-R major and minor forms, which are induced after infection with viruses. - Salt-induced protein 

25 NP24 from tomato. - Osmotin. a salt-induced protein from tobacco. - Osmotin-like proteins OSML13, OSML15 and 
OSML81 from potato [2]. - P21, a leaf protein from soybean. - PWIR2, a leaf protein from wheat. - Zeamatin. a maize 
antifunal protein [3].The exact biological function of all these proteins is not yet known. A conserved region that includes 
three cysteine residues known (in thaumatin) to be involved in disulfide bonds has been selected as a signature pattem. 
+ +1+ + II ** III 

30 xxCxxxxxxxxxxxxxxxxCxxCxxCxCxxxxxxxxxxxxxxCxxCxCxxxCxCxxCCxCxxxCxxxxx CxxxCx|||||||||||| +— + +-+ | + — + 

+__++-+ I + +'C': conserved cysteine involved in a disulfide bond.'**: position of the pattern. 

Consensus pattem: G-x-[GFl-x-C-x-T-[GA]-D-C-x(1,2)-G-x(2,3)-C 

[ 1] Edens L., Heslinga L., Klok R.. Ledeboer A.M., Maat J., Toonen M.Y.. Visser C, Venips C.T. Gene 18:1-12(1982). 
[2] Zhu B., Chen TH.H.. Li RH. Plant Physiol. 1 08:929-937(1 995).[ 3] Malehom D.E.. Borgmeyer J.R.. Smith C.E., 
35 Shah D.M.; Plant Physiol. 106:1471-1481(1994). 
[1557] 667. Thiolases signatures 

Two different types of thiolase (1,2,3) are found both in eukaryotes and in prokaryotes: acetoacetyl-CoA thiolase (EC 
2.3.1.9) and 3-ketoacyl-CoA thtolase(EC 2.3.1.16 ). 3-ketoacyl-CoA thiolase (also called thiolase I) has a broad chain- 
length specificity for its substrates and is involved in degradative pathways such as fatty acid beta-oxidation. Ace- 

40 toacetyl-CoA thiolase (also called thiolase 1 1) is specific for the thiolysis of acetoacetyl-CoA and involved in biosynthetic 
pathways such as poly beta-hydroxybutyrate synthesiser steroid biogenesis. In eukaryotes, there are two forms of 
3-ketoacyl-CoA thiolase: one located in the mitochondrion and the other in peroxisomes. There are two conserved 
cysteine residues important for thiolase activity. The first located in the N-terminal section of the enzymes is involved 
in the formation of an acyl-enzyme intermediate; the second located at the C-terminal extremity Is the active site base 

45 involved in deprotonation in the condensation reaction. Mammalian nonspecific lipid-transfer protein (nsL-TP) (also 
known as sterol carrier protein 2) is a protein which seems to exist in two different forms: a 14 Kd protein (SCP-2) and 
a larger 58 Kd protein (SCP-x). The former is found in the cytoplasm or the mitochondria and is involved in lipid transport; 
the latter is found in peroxisomes. 

The C-terminal part of SCP-x is identical to SCP-2 while the N-terminal portion is evolutionary related to thiolases[4]. 
50 Three signature patterns have been developed for this family of proteins, two of which are based on the regions around 
the biologically important cysteines. The third is based on a highly conserved region in the C4erminal part of these 
proteins. 

Consensus pattem: [LIVM]-[NSTj-x(2)-C-[SAGLI]-[ST|-[SAG]-[LIVMFYNS]-x-[STAG]-(LIVM]-x(6)-[LIVM] (C is involved 
in formation of acyl-enzyme intermediate] 
55 Consensus pattem: N-x(2)-G-G-x-[LIVM]-[SA]-x-G-H-P-x-[GA]-x-[STl-G 

Consensus pattem: [AG]-[LlVMA]-[STAGCLtVM]-[STAG]-[LIVMA]-C-x-[AG]-x-[AG]-x- [AG]-x-[SAG] [C Is the active site 
residue] 

[ 1] Peoples O.P.. Sinskey A.J. J. Biol. Chem. 264: 15293-1 5297(1 989). [ 2] Yang S.-Y.. Yang X.-Y.H., Healy-Louie G.. 
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Schuiz H.. Elzinga M. J. Biol. Chem. 265:10424-10429(1990).[ 3] .gual J.C ""^^^^f'^^^^.^^^^^^^^ 

J.E. J. Mol Fvol. 35: 147-1 55(1 992).r 41 Baker M.E.. Billheimer J.T., Strauss J.F. Ill DNA Cell Biol. 10.695-698 



nr+i 



(1991). 

ri5581 668. Thioredoxin family active site -^w^^^-.o 
Loredoxins [1 to 4] are small proteins of approximately one hundred amirio-acid residues which ^^^^ 

redox reactions via the reversible oxidation of an active center disulfide bond. They exist in erther a reduced form o 

fn Jokaryotes and eukaryotes and the sequence around the redox-active disulfide bond is ^'f ^"f J^f^^^^J^^- 
opSage T4 also encodes for a thioredoxin but its primary structure is not homologous to bactenal. P'^"*^ "^J'^rte^^^^ 
r?edoxins. A number of eukaryotic proteins contain domains evolutionary related 

be protein disulphide isomerases (PDI). PDI(EC MM) [5.6.71 is an endoplasmic reticulum ^"^^'^^ '^^'^'^f^^ 
the rearrangement of disulfide bonds in various proteins. The various forms of PDI which are .•^"^""y ^nown a e_ - 
PDI major isozyme: a multifunctional protein that also function as the beta subunit of prolyl ^-hydroxylase (EC 
1.14.11.2) . as a component of oligosaccharyl transferase (EC 2A1119), as thyroxine a^^.^/ 
iiH^e-insulin transhydrogenase (EC 18A2) and as a thyroid hormone-binding protein ! - ^^P^O f 
microsomal protein). ERp60 was originally thought to be a phosphoinositide-specific phospholipase C '^ozyme and 
S a Srotease. - ERp72. - P5.AII PDI contains two or three (ERp72) copies of the thioredoxin domain. Bactenal 

^^teinsthatact as thiol:disulfide interchange proteins thatallows disulfide tK>ndfo^ 

also contain a thioredoxin domain. These proteins are: - Escherichia coli dsbA (or prfA) and its orthologs in Vibno 
cSe^^G and Haemophilus influent (por). - Escherichia coli dsbC (or xpRA) and its °rthologs in Er^inia 
ch^Lnthemi and Haemophirus influenzae. - Escherichia coli dsbD (or dipZ) and its Haemophilus '"fl^-^^^^^^^^J 
JLherichia coli dsbE (or ccmG) and orthologs in Haemophilus influenzae, Rhodobacter capsulatus (helX) Rhizio- 
bfaSe (cycY an! SpA^ Concensus pattern: [LlVMn-[UVMSTAl-x-[LIVMFYC].[FYWSTHE]-x(2)- IFYWGTN]-C- [GAT- 
PL VEl-rPHYWSTAl-C-x(6HLIVMFYWT] [The two C's form the redox-active bond] ^. , „ 

nTHolmgrri AnL R^^^ Biochem.'i:237-271(1985).l2l Gleason F.K., Holmgren A^FEMS M'-o*-^- ^J"- 
271-297(?988).[31 Holmgren A. J. Biol. Chem. 264:13963-13966(1989).[4] Eklund H.. Gleason Holmg en A. 
Proteins 11 : 13-28(1 99 D.fs] Freedman R.B.. Hawkins H.C.. Murant S.J.. Reid L. Biochem. SocJ^f."^; 6^6-99(1 9^^^^ 
16] Kivirikko K.I.. Myllyla R . Pihiajaniemi T FASEB J. 3:1609-1617{1989).[7] Freedman R.B.. Hirst TR.. Turte M.F. 
Trends Biochem. Sci. 19:331-336(1994). 

M5591 669 (Transcript fac2) Transcription factor TFIIB repeat signature . -.4 

; eukarySes he initiation of transcription of protein encoding genes by polymerase II is modulated 9en«jal and 
specificLnscription factors. The general transcription factors operate through common pro™te.^elernents (su(J as 
the TATA box) At least seven different proteins associates to form the general transcnption factors. TFIIA. -IIB. -IID. - 
ST-Sg' and -IIHlH.Transcription factor IIB (TFIIB) plays a central role in the " ^Ilr'^^f/.l^^^^e' 

as^ciates with a complex of TFIID-IIA bound to DNA (DA complex) to form a ternary complex TFIID-IIA-IBB (DAB 
crplSrwS isThen recognized by RNA polymerase 1. [2.3]. TFIIB is a protein f^'^-' ^15 to 340a^^^^^^^^^^ 
residues Which contains, in its C-tenninal part an imperfect repeat of a domain of f -^/^ -^'^^^^^^^^^ SlnaS 
contribute an element of symmetry to the folded protein. The following proteins have been 

related to TFIIB - An archaebacterial TFIIB homolog. In Pyrococcus woesei a previously undetected open reading 
Sme has been shown [4] to be highly related to TFIIB. - Fungal transcription factor IIIB 70 Kd subunrt (gene 
PCF4n-DS4/BRF1 ) [5] This protein is a general activator of RNA polymerase III transcription and plays a role analogous 
to thatof TFIIB in pol III transcription. The central section of the repeated domain, which is the most conserved part of 

Kb-raSi-e^^^^^^^^^ 

B., Jackson S.P Genes Dev. 8:2879-2890(1994). 

[1560] 670. (transcritp fact) MADS-box domain signature and profite =e tha 

k number of transcription factors contain a conserved domain of 56 amino-acid residues, sometimes ^nown^ tj« 
M^oS-box domain [EQ. They are listed below: - Semm response factor (SRF) [1] a mammalian transoiption fartor 
It binds to the SeU Response Element (SRE). This is a short sequence of dyad -y^'^^'^J;^^;^^^'^''^ 
5'end of the transcription initiation site of genes such as c-fos. - Mammalian myocyte-specific enhancer factors 2A to 
2D (MEF2A to MEF2D). These proteins are transcription factor which binds specifically to the ^^ment pre^ 
in the regulatory regions of many muscle^pecific genes. - Drosophila myocyte-specrfic enhancer factor 2 (MEF2)^- 
Yeast GRM/PRTF protein (gene MCM1) [2], a transcriptional regulator of mating-type-specific genes. - Yeast arginine 
meteb?.rsreguraSonprotll(geneAR^ 

SMP1 . - Arabidopsis thaliana agamous protein (AG) [3]. a probable transcription factor involved .n ^"^t'"gj«"«^ 
Lt determines stemen and carpel development in wild^ype flowers. Mutationsin the AG gene resultinthe replacement 
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of the stamens by petals and the carpels by a new flower. - Arabidopsis thaliana homeotic proteins Apetalal (AP1), 
ApetalaS (APS) and Pistiliata (PI) which act locally to specify the identity of the floral meristem and to determine sepal 
and petal development [4]. - Antirrhinum majus and tobacco homeotic protein deficlens (DEFA) and globose (GLO) 
[5]. Both proteins are transcription factors involved in the genetic control of flower development. Mutations in DEFA or 

5 GLO cause the transformation of petals into sepals and of stamina into carpels. - Arabidopsis thaliana putative tran- 
scription factors AGL1 to AGL6 [6]. - Antirrhinum majus morphogenetic protein DEF H33 (squamosa).ln SRF, the 
conserved domain has been shown [1] to be involved in DNA-binding and dimerization. A pattern that spans the com- 
plete length of the domain has been derived. The profile also spans the length of the MADS-box. 
Consensus pattern: R-x-[RK]-x(5)-|.x-[DNGSK]-x(3)-[KRl-x(2)-T-[FY]-x-[RK](3)- x(2)-[LIVMl-x-K(2)-A-x-E-[LIVM]- 

10 [STA]-x-L-x(4)-[LIVM]-x-[LIVM](3)-x(6)-(LIVMF]-x(2)-tFY] 

[ 1] Norman C. Runswick M., Pollock R.. Treisman R. Cell 55:989-1 003(1 988). [ 2] Passmore S.. Maine G.T., Elbie R.. 
Christ C. Tye B.-K. J. Mol. Biol. 204:593-606(1 988).[ 3] Yanofsky M.. Ma H., Bowman J., Drews G.. Feldmann K.A.. 
Meyerowitz E.M. Nature 346:35-39(1 990).[ 4] Goto K., Meyerowitz E.M. Genes Dev. 8:1548-1 560(1 994). [ 5] Troebner 
W., Ramirez L, Motte P., Hue I., Huijser P., Loennig W.-E.. Saedler H., Sommer H,. Schwartz-Sommer 2. EMBO J. 

15 11:4693-4704(1992).[ 6] Ma H.. Yanofsky M.R. Meyerowitz E.M. Genes Dev. 5:484-495(1 991). [El] 
[1 561] 671 . Transketolase signatures 

Transketolase (EC 2.2.1.1 ) (TK) catalyzes the reversible transfer of a two-carbon ketol unit from xylulose 5-phosphate 
to an aldose receptor, such as ribose 5-phosphate, to form sedoheptulose 7-phosphate and glyceraldehyde 3-phos- 
phate. This enzyme, together with transaldolase. provides a link between the glycolytic and pentose-phosphate path- 

20 ways. TK requires thiamin pyrophosphate as a cofactor. In most sources where TK has been purified, it is a homodimer 
of approximately 70 Kd subuntts. TK sequences from a variety of eukaryotic and prokaryotic sources [1.2] show that 
the enzyme has been evolutionarily conserved. In the peroxisomes of methylotrophic yeast Hansenula polymorpha, 
there is a highly related enzyme, dihydroxy-acetone synthase (DHAS) (EC 2.2.1.3 ) (also known as formaldehyde tran- 
sketolase), which exhibits a very unusual specificity by including formaldehyde amongst its substrates. 1-deoxyxylu- 

25 (ose-5-phosphate synthase (DXP synthase) [3] is an enzyme so far found in bacteria (gene dxs) and plants (gene 
CLA1) which catalyzes the thiamin pyrophosphoate-dependent acyloin condensation reaction between carbon atoms 
2 and 3 of pyruvate and glyceraldehyde 3-phosphate to yield 1-deoxy-D- xylulose-5-phosphate (dxp), a precursor in 
the biosynthetic pathway to isoprenoids, thiamin (vitamin B1 ), and pyridoxol (vitamin 86). DXP synthase is evolutionary 
related to TK. Two regions of TK have been selected as signature patterns. The first, located in the N-terminal section, 

30 contains a histidine residue which appears to function inproton transfer during catalysis [4]. The second, located in the 
central section, contains conserved acidic residues that are part of the active cleft and may participate in substrate- 
binding [4]. 

Consensus pattem: R-x(3)-[LIVMTAl-[DENQSTHKF)-x(5,6)-[GSNl-G-H-[PLIVMF]-[GSTAl-x(2)-[LIMC]-[GS 
Consensus pattem: G-[DEQGSAl-[DN]-G-[PAEQ]-[ST|-[HQ]-x-[PAGM]-[LIVMYAC]-[DEFYW]-x(2)-[STAP]-x(2)-[RGA] 

35 [ 1] Abedinia M., Layfield R., Jones S.M., Nixon P.F., Mattick J.S. Biochem. Biophys. Res. Commun. 183:1159-1166 
(1992).[2] Fletcher T.S., Kwee I.L. Nakada T. Largman C, Martin B.M. Biochemistry 31:1892-1896(1992).[ 3] 
Sprenger G.A., Schorken U.. Wiegert T, Grolle S.. De Graaf A.A., Taylor S.V., Begley TP., Bringer-Meyer S.. Sahm 
H. Proc. Natl. Acad. Sci. U.S.A. 94:12857-1 2862(1 997). [4] Lindqvist Y., Schneider G.. Emriler U., Sundstroem M. EMBO 
J. 11:2373-2379(1992). 

^ [1562] 672. Transmembrane 4 family signature 

Recently a number of eukaryotic cell surface antigens have been found to be evolutionary related [1 ,2,3]. The proteins 
known to belong to this family are listed below: - Mammalian antigen CD9 (MIC3); A protein involved in platelet activation 
and aggregation. - Mammalian leukocyte antigen CD37, expressed on B lymphocytes. - Mammalian leukocyte antigen 
CD53 (OX-44), which may be involved in growth regulation in hematopoietic cells. - Mammalian lysosomal membrane 

45 protein CD63 (melanoma-associated antigen ME491; antigen AD1). - Mammalian antigen CD81 (cell surface protein 
TAPA-1 ), which may play an important role in the regulation of lymphoma cell growth. - Mammalian antigen CD82 
(protein R2; antigen C33; Kangai 1 (KAI1)), which associates with CD4 or CD8 and delivers costimulatory signals for 
the TCR/CD3 pathway. - Mammalian antigen CD151 (SFA-1; platelet-endothelial tetraspan antigen 3 (PETA-3)). - 
Mammalian cell surface glycoprotein A15 (TALLA-1; MXS1). - Mammalian novel antigen 2 (NAG-2). - Human tumor- 

50 associated antigen CO-029. - Schistosoma mansoni and japonicum 23 Kd surface antigen (SM23 / SJ23).These pro- 
teins share the following characteristics: they all seem to be type 111 membrane proteins (type III proteins are integral 
membrane proteins that contain a N-terminal membrane-anchoring domain which is not cleaved during biosynthesis 
and which functions both as a translocation signal and as a membrane anchor); they also contain three additional 
transmembrane regions, at least seven conserved cysteines residues, and are of approximately the same size (218 

55 to 284 residues). These proteins are collectively know as the 'transmembrane 4 super family' (TM4) because they span 
the plasma membrane four times. A schematic diagram of the domain structure of these proteins isshown below, -i-- 

+ + + — + + + + — + II TMa I Extra | TM2| Cyt j TM3 | Extracellular | TM4 | 

Cyt| +-+ + + — C C + CC C C- -+ C — + •**•**•*• Cyt: cytoplasmic domain. TMa : 
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transmembrane anchor.TM2 to TM4: transmembrane regions 2 to 4;CV conserved cysteine. : P°=f " /"^ 
th«« inri..H«« two cysteines and seems to be located in a short cytoplasmic loop between two 
i^a'^Tsmembrane'domains has been selected as a signature for these proteins. n ,VM1(2) 

♦hi ^nS See conserved acidic residues has been selected as a signature pattern for the alpha chain. The first 
Tnd d esidt^s ar^ believed to seive as proton donors/accepto^ in the enzym^ catalytic mechan«m. 

ion^nsus pattern: [LIVM]-E-[LlVM]-G-x(2)-[FYC,-[STl-[DE]-[PA]-[LlVMY]- AGLn-[DE]^^^^ 8-27-32(1990) 
Ml Crawford I.R Annu. Rev. Microbiol. 43:567-600(1989).[ 2] Hyde C C-. Miles E^W^ Bio/Technology 8.27-32(1990). 
3 Berlyn M.S.. Last R.L.. Fink G.R. Proc. Natl. Acad. Sci. U.S.A. 86:4604-4608(1989). 

ri 5641 674 Tryptophan synthase beta chain pyridoxal-phosphate attachment site ^^,^,^^^rr^i nhn<! 

?r^Dtophan syZse catal^es the last step in the biosynthesis of tryptophan: the conversion of '"dolegiycerol phos- 
Ihate and sSne totryptophan and glyceraldehyde 3-phosphate [1.2]. It has two functional domains: one the aklo 
^eavage ofTdo^Serol phosphate to indole andglyceraldehyde 3-phosphate and the other for the syn hesis of 
CtSn -Idole and serine. In bacteria and plants [3], each domain is found - ^JP^'^J, ^^^^^^^ 
beta chains) while in fungi the two domains arefused together on a single multifunctional Protein. The hete chain ot 
thi eSS^S^ erpyri<k,xal-phosphate as a cofactor. The pyridoxal-phosphate group is attached to a ^^^due^ 
The^n 3nd Sj^ne Lui^ also contains two histidine residues which are part of the pyndoxal-phosphate 
SsTe ^e 1^^^^^^^^^ pattern for the tryptophansynthase beta chain is derived from that conserved region. 

- Consensus pattern: [LIVM]-x-H-x-G-lSTA]-H-K-x-N [K is the pyrldoxal-P attachment site] 

r 11 Crawford I P Annu. Rev. Microbiol. 43:567-600(1989).[ 21 Hyde C.C.. Miles E.W. BioTTechnology 8:27-32(1990). 
[ sl Be^M B Last R.L., Fink G.R. Proc. Natl. Acad. Sci. U.S.A. 86:4604-1608(1989). 

5;i^tal^ic a^ Itn^^^^^ "pin family is provided by a charge relay system involvlrig an 

Soart^lcS Swue hydrogen-bonded to a histidine. which itself is hydrogen-bonded to a senne The sequences in 
"e vrc^Ji^ofT^^^^^ and histidine residues are well conserved in this family of P-^ea-s 

isto oroteases known to belong to the trypsin family is shown below. - Acrosin. - Blood coagulation factors VII IX X 
list or proteases Known lu uc a 'Jl . _ rathMsin G - Chvmotrypsins. - Complement components CI r, 

XI and Xll, thrombin, plasminogen, and protein C. - Cathepsin fa. ^"y™^^' Jla_,:„- factor - Cytotoxic 

Wegener's autoantigen). - Plasminogen activators (urokinase^ype, and tissue-type). - Trypsins I, IL HL and IV. - Tryp 
ases Snake venom proteases such as ancrod. batroxobin. cerastobin, flavoxobin. and protein C activator^ - Co a- 
aenase from common 'cattle grub and collagenolytic protease from Atlantic sand fiddler crab. " Apohpoproteirja)^- 
I'od flut cereal protease. - Drosophila trypsin like proteases: alpha easter -^J-'-^^gf i;;*?^^^^^^^ 
stubble (gene sb) - Major mite fecal allergen Der p 111. All the above proteins belong to family 81 m the classification 
S DeSdases[2 E11 and originate from eukanrotic species. It should be noted thatbacterial proteases that belong to 
fi s^ S sM^ enough in the regions of the active site residues that they can be picked up by the same patterns. 
^Z^^n^Stto^ AcLrrv^tacXer lyticus protease 1. - Lysobacter alpha-lytic protease. - Streptognsin 
Tand BS^r^^cis^^^^^^^ A and B). - Streptomyces griseus glutamyl endopeptidase 11. - Streptomyces fradiae 
proteases 1 and 2. 

[1566] 676. (tsp) Thrombospondin type 1 domain 
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[1567] [1] Bork P; FEBS lett 1993;327:125-130. 

[1568] 677. Tubulin subunits alpha, beta, and gamma signature 

Tubulins [1 .2], the major constituent of microtubules are dimeric proteins which consist of two closely related subunits 
(alpha and beta). Tubulin binds two molecules of GTP at two different sites (N and E). At the E (Exchangeable) site, 

5 GTP is hydrolyzed during incorporation into the microtubule. Near the E site is an invariant region rich in glycines which 
is found in both chains andwhich is now [3] said to control the access of the nucleotide to its binding site. A signature 
pattern was developed from this region. With the exception of the simple eukaryotes, most species express a variety 
of closely related alpha and beta isotypes. In most species there is a third member of the tubulin family: gamma tubulin. 
Gamma tubulin is found at microtubule organizing centers (MTOC) such as the spindle poles or the centrosome, sug- 

10 gesting that it is involved in the minus-end nucleation of microtubule assembly [4]. 
Consensus pattem: (SAG]-G-G-T-G-[SA]-G 

[ 1] Cleveland D.W.. Sullivan K.F. Annu. Rev. Biochem. 54:33 1-365(1 985). [2] Joshi H.C., Cleveland D.W. Cell Motil. 
Cytoskeleton 16:159-163(1990).[ 3] Hesse J., Thierauf M.. PonstingI H. J. Biol. Chem. 262:1 5472-1 5475(1 987).[ 4] 
Joshi H.C. BioEssays 15:637-643(1993). 

15 [1569] Tubultn-beta mRNA autoregulation signal 

The stability of beta-tubulin mRNAs are autoregulated by their own translation product [1]. Unpolymerized tubulin sub- 
units bind directly (or activate a factor(s) which binds co-translationally) to the nascent N-terminus of beta-tubulin. This 
binding is transduced through the adjacent ribosomes to activatean RNAse that degrades the polysome-bound mRNA. 
The recognition element has been shown to be the first four amino acids of beta-tubulin: Met-Arg-Glu-lle. Mutations 

20 to this sequence abolish the autoregulation effect (except for the replacement of Glu by Asp); transposition of this 
sequence to an intemal region of a polypeptide also suppresses the autoregulatory effect. 
Consensus pattem: <M-R-[DE]-[IL] 

[ 1] Cleveland D.W. Trends Biochem. Sci. 13:339-343(1988). 

[1570] 678. (tRNA-synt 2c) Aminoacyl-transfer RNA synthetases class-ll signatures. Aminoacyl-tRNA synthetases 

25 (EC 6.1.1.-) [1] are a group of enzymes which activate amino acids and transfer them to specific tRNA molecules as 
the first step in protein biosynthesis. In prokaryotic organisms there are at least twenty different types of aminoacyl- 
tRNA synthetases, one for each different amino acid. In eukaryotes there are generally two aminoacyl-tRNA synthetas- 
es for each different amino acid: one cytosolic form and a mitochondrial form. While all these enzymes have a common 
function, they are widely diverse in terms of subunit size and of quaternary structure. The synthetases specific for 

30 alanine, asparagine. aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine are referred 
to as class-ll synthetases [2 to 6] and probably have a common folding pattern in their catalytic domain for the binding 
of ATP and amino acid which is different to the Rossmann fold observed for the class I synthetases [7].CIass-ll tRNA 
synthetases do not share a high degree of similarity, however at least three conserved regions are present [2,5,8], 
Signature patterns have been derived from two of these regions. 

35 Consensus pattem: [FYH]-R-x-[DE]-x(4. 1 2).[RH]-x(3)-F-x(3)-[DE]- 

Consensus pattem: [GSTALVF]-{DENQHRKPHGSTAJ-[LIVMFl-[DEl.R-ILIVMF]-x-[LIVMSTAG]-[LIVMFY]- 
[1571] [ 1] Schimmel P. Annu. Rev. Biochem. 56:125-158(1987).[ 2] Delarue M., Moras D. BioEssays 15:675-687 
(1993).[3] Schimmel P. Trends Biochem. Sci. 16:1-3{1991).[ 4] Nagel G.M., Doolittle R.F. Proc. Natl. Acad. Sci. U.S. 
A. 88:8121-8125(1991). [5] CusackS., Haertlein M.. Lebennan R. Nucleic Acids Res. 19:3489-3498(1 991 ).[ 6] Cusack 

40 s. Biochimie 75:1077-1081(1993).[ 7] Cusack S., Berthet-Colominas C, Haertlein M., Nassar N,. Leberman R. Nature 
347:249-255(1990).[8] Leveque R, Plateau P.. Dessen P., Blanquet S. Nucleic Acids Res. 18:305-312(1990). 
[1572] 679, UBA-domain 

[1573] The UBA-domain (ubiquitin associated domain) is a novel sequence motif found in several proteins having 
connections to ubiquitin and the ubiquitination pathway. The structure of the UBA domain consists of a compact three 
45 helix bundle [1]. Number of members: 84 

[1 574] [1] Structure of a human DNA repair protein UBA domain that interacts with HIV-1 Vpr. Dieckmann T, Withers- 
Ward ES, Jarosinski MA. Liu CF. Chen IS. Feigon J; Nat Struct Biol 1998;5:1042-1047. 
[1575] 680. UBX domain 

Domain present in ubiquitin-regulatory proteins. Present in FAF1 and Shplp.Number of members: 19 
50 [1] The UBA domain: a sequence motif present in multiple enzyme classes of the ubiquitination pathway. Hofimann K, 
Bucher P; Trends Biochem Sci 1996;21:172-173. 

[1576] 681. (UCH) Ubiquitin carboxyl-terminal hydrolases family 1 cysteine active site 

Ubiquitin carboxyl-terminal hydrolases (UCH) (deubiquitinating enzymes) [1,2] are thiol proteases that recognize and 
hydrolyze the peptide bond at the C-terminal glycine of ubiquitin. These enzymes are involved in the processing of 
55 poly-ubiquitin precursors as well as that of ubiquinated proteins. There are two distinct families of UCH. The first class 
consist of enzymes ofabout 25 Kd and is currently represented by: - Mammalian isozymes LI and L3. - Yeast YUH1. 
- Drosophila Uch.One of the active site residues of class-l UCH [3] is a cysteine. A signature pattem has been derived 
from the region around that residue. Consensus pattem: Q-x(3)-N-[SA]-C-G-x(3)-[LIVM](2)-H-[SA]-[LIVM]-[SA] [C is 



239 



EP 1 033 405 A2 



^fllfc I^Zm, H«...«,rH.-P. Biochim. BioDhvs. Acta 1089:127-139(1991).[ 2] D'andrea A.. Pe.lman D. Crit. 
ReV Bio"Siem.'Mor Biol. 33:337-352(1998).[ 3] Johnston S.C.. Larsen C.N.. Cook W.J., Wilkinson K.D.. Hill CP. tMbU 
J. 1 6:3787-3796(1 997).[ 4] Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:461-486(1994). 
ri5771 682 Ubiquitincarboxyl-terminal hydrolases family 2 signatures (UCH-1) 

Ubiquitin carboxyi-tenriinai hydrolases (UCH) (deubiquitinating enzymes) [1 .2] are thiol proteases that recc^^Jf « ^"J 
hydrolyze the peptide bond at the C-temiinal glycine of ubiquitin. These enzymes are involved *e Proce^ing (rf 
poiy-uWquitin Cursors as well as that of ubiquinated proteins. There are two distinct fa-^es of UCa The second 
class consist of largeproteins (800 to 2000 residues) and is cun-ently represented by: - Yeast UBP1, UBP2, UBP3. 
SrP4To;S^/SS^^^^ UBP5:UBP7. UBP9. UBP10. UBP11, UBP12, UBP13. UBP14. UBP15 and ^BP 6^' H"-" 
tre-2. - Human isopeptidase T - Human isopeptidase T-3. - Mammalian Ode-1. - Mammalian Unp. " ^^^^^ Dub-V - 
Drosophila fat facete protein (gene faf). - Mammalian faf homolog. - Drosophila !>Ubfv64E. - Caenorhabdit.s elegans 
hypothetical protein R10E11.3. - Caenorhabditis elegans hypothetical protein K02C4.3.These proteins ""'y^hare tv«, 
regions of similarity. The first region containsa conserved cysteine which is P^^^ably implicated in the ca jy^^m^^^^ 
anism. The second region contains two conserved histidines residues, one of which .s also probably implicated in the 
catalvtic mechanism. Signature patterns for both conserved regions have been developed. ^ 
?onleJsrSttem:G-ILIVMFYhx(1.3)-[AGC]-[NASM]-x-C-^ 

C^nSsL^'ittem-^^^Lx-P^^^^ Fhe two H's are putative ^^^f y't^^^^j^^^^l 

[ 1] Jentsch S., Seufert W., Hauser H.-P. Biochim. Biophys. Acta 1 089:1 27-1 39( 991 ).[ 2) ^'^'I'^^^^^^J^^I^- 
Rev. Biochem. Mol. Biol. 33:337-352(1998).[3] Rawlings N.D.. Barrett A.J. Meth. Enzymol. 244.461-486(1994). 
ri5781 683 Ubiquitin carboxyl-tenninal hydrolases family 2 signatures (UCH-2) 

Ubiqu tin carboxyl-terminal hydrolases (UCH) (deubiquitinating enzymes) [1 ,2] are thiol proteases that r^cogm^^ 
hydrolyze the peptide bond at the C-terminal glycine of ubiquitin. These enzymes are involved *e Processm^^^^ 
poly-ubiqultin precursors as well as that of ubiquinated proteins. There are two dist.nct ^CK JJ^ ^^^f 
class consist of largeproteins (800 to 2000 residues) and is currently represented by: - Yeast UBP1, UBP2, UBP3. 
SbP41S^ Sa4/SSW). UBP5:UBP7. UBP9. UBP10. UBP11. UBP12. UBP13 UBP14, UBP15 and UBP16^- Human 
tre-2 - Human isopeptidase T - Human isopeptidase T-3. - Mammalian Ode-1 . - Mammalian Unp. - Mouse Dub-1 . - 
D^sophi"aTt facets protein (gene faf). - Mammalian faf homolog. - Drosophila D-Ubp-64E. - Caenorhabditis elegans 
hyLhetical protein R10E11 .3. - Caenorhabditis elegans hypothetical protein K02C4.3.These proteins only share two 
^tons of similarity. The first region containsa conserved cysteine which is P-^^^Wy '"^^^^S^^SJ^ 
^sm. The second region contains two conserved histidines residues, one of which is also probably implicated .n the 
catalvtic mechanism. Signature patterns for both conserved regions have been developed. 
S^s;rSSem:G.[LIVMFY]-x(1.3)-[AGC]-[NASMl-x-C-[FYW]-[LlVMC]-(NST^^ 

c7nS'^LTatt^m:"^x-L-x-[SAG]-[LIVMFTl-x(2)-H-x-G-x(4,5)-G-H-Y [The two H's are putetive ^^i^yj^^^^^^l?"^^! 
[ 1] Jentsch S.. Seufert W., Hauser H.-P. Biochim. Biophys. Acta 1089:127-139(1991).[ 2] D Andrea A Pel man D. Crit. 
Rev. Biochem. Mol. Biol. 33:337-352(1998).[ 3] Rawlings N.D.. Barrett A.J. Meth. Enzymol. 244:461-486(1994). 

[15791 684. UDP-glycosyltransferases signature , , 

UDP glycosyltransferases (UGT) are a superfamily of enzymes that catalyzes the addition of the glycosyl group from 
a UTP-sSa? to a small hydrophobic molecule. This family currently consist of: - Mammalian UDP^Iucoronosyl trans- 
ferases (UDPGT) 11 2] A large family of membrane-bound microsomal enzymes which catalyze the transfer of glu- 
curonic acid to a wide variety of exogenous and endogenous lipophilic substrates. These enzymes are of major |m- 
portance in the detoxification and subsequent elimination of xenobiotics such as drugs and '^^^""•'S^"^:. ^ '^^^^ 
number of putative UDPGT from Caenorhabditis elegans. - Mammalian 2-hydroxyacylsph.ngosine -f^t^jSj^ctos^- 
transferase [3] (also known as UDP-galactose-ceramide galactosyltransferase). This enzyme f ^'V^^^ 'J^ I^^^J^^ 
of galactose to ceramide, a key enzymatic step in the biosynthesis of g^'actocerebrosides which are abu^^^^^^^ 
gonpids of the myelin membrane of the central nervous system and peripheral nervous system. ; P'^nts Aavonol JJS)- 
grlsyltransferase. An enzyme [4] that catalyzes the transfer of glucose from UDP^Iucose to^^^^^^^^^ 
is essential and one of the last steps in anthocyanin pigment biosynthesis. - Baculoviruses ecdysteroid UDP-glucosyl- 
^nsf^ase^EC 2.4.1.-) [5] (egt). This enzyme catalyzes the transfer of glucose from UDP^Iucose to ectysteroids 
insert molting tiZes. The expression of egt in the insect host interferes with the nom,al insect deve op- 
ment by b ocking the molting process. - Prokaryotic zeaxanthin glucosyl transferase (gene crtX), an ^n^-e involved 
in carotenoid biosynthesis and that catalyses the glycosylation reaction which converte zeaxanthin ^ 
diglucoside.-Streptomyces macrolide glycosyltransferases 16]. These enzymes specjfically ''^'f ^'^^^Tl^^ 
tib^otics via 2'-0-glycosylation using UDP^Iucose.These enzymes share a conserved domain of about 50 ammo acd 
residues locatedin their C-terminal section and from which a pattern has been extracted tod^*®^*™'"]'-,^^, 
Sl^ensus pattern: [FW]-x(2)-Q-x(2)-[LIVMYAHLIMV]-x(4.6HLVGACl- [LVFYA]-[LIVMF]-[STAGCM]-[HNQh 
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(STAGC]-G-x(2)-[STAGhx(3)-[STAGL]. [LIVMFA]-x(4)-[PQR]-[LIVMT]-x(3)-[PA]-x(3)-[DES]-[QEHN] 
[ 1] Dutton G.J. (In) Gtucoronidation of drugs and other compounds. Outton G.J., Ed., pp 1-78. CRC Press. Boca Raton, 
(1980).[ 2] Burchell B.. Nebert D.W.. Nelson D.R.. Bock K.W.. tyanagi T.. Jansen P.L. Lancet D.. Mulder G.J.. Chow- 
dhury J.R.. Siest G.. Tephly T.R.. Mackenzie P.l. DNA Cell Biol. 10:487-494(1 991 ).[ 3] Schulte S.. Stoffel W. Proc. Natl. 
5 Acad. Sci. U.S.A. 90: 10265-1 0269(1 993). [ 4] Furtek D., Schlefelbein J.W.. Johnston R. Nelson O.E. Jr. Plant Mol. Biol. 
11:473-481(1988).[ 5] O'Reilly D.R.. Miller L.K. 

Science 245: 111 0-1 11 2(1 989). [ 6] Hernandez C. Olano C. Mendez C, Sales J.A. Gene 134:139-140(1993). 
[1580] 685. UDP-glucose/GDP-mannose dehydrogenase family 

[1581] The UDP-glucose/GDP-mannose dehydrogenaseses are a small group of enzymes which possesses the 
10 ability to catlyze the NAD-dependent 2-fold oxidation of an alcholol to an acid without the release of an aldehyde 
intermediate [2]. Number of members: 55 

[1582] [1] Purification and characterization of guanosine diphospho-D-mannose dehydrogenase. A key enzyme in 
the biosynthesis of alginate by Pseudomonas aeruginosa. Roychoudhury S, May TB, Gill JF, Singh SK, Feingold DS, 
Chakrabarty AM; J Biol Chem 1 989;264:9380-9385. [2] Properties and kinetic analysis of UDP-glucose dehydrogenase 
15 from group A streptococci. In-eversible inhibition by UDP-chloroacetol. Campbell RE, Sala RF, van de Rijn I. Tanner 
ME; J Biol Chem 1997;272:3416-3422. 
[1583] 686. Uracil-DNA glycosylase signature 

Uracil-DNA glycosylase (EC 3.2.2.-) (UNG) [1] is a DNA repair enzyme that excises uracil residues from DNA by 
cleaving the N-glycosylic bond. Uracil in DNA can arise as a result of misincorportation of dUMP residues by DNA 

20 polymerase or deamination of cytosine. The sequence of uracil-DNA glycosylase is extremely well conserved [2] in 
bacteria and eukaryotes as well as in herpes viruses. More distantly related uracil-DNA glycosylases are also found 
in poxviruses [3]. In eukaryotic cells, UNG activity is found in both the nucleus and the mitochondria. Human UNG1 
protein is transported to both the mitochondria and the nucleus [4]. The N-terminal 77 amino acids of UNG1 seem to 
be required for mitochondrial localization [4], but the presence of a mitochondrial transitpeptide has not been directly 

25 demonstrated. As a signature for this type of enzyme, the most N-termina conserved region has been selected. This 
region contains an aspartic acid residue which has been proposed, based on X-ray structures [5,6] to act as a general 
base in the catalytic mechanism. 

Consensus pattem: [KRl-[LIV]-[LIVCl-[LIVM]-x-G-[QI]-D-P-Y [D is the active site residue]- 

[ 1] Senear A., Senear G.B. Annu. Rev. Biochem. 57:29-67(1988).[ 2] Olsen L.C.. Aasland R., Wittwer C.U., Krokan 
30 H.E., Helland D.E. EMBO J. 8:3121-3125 (1989).[ 3] Upton C. Stuart D.T.. McFadden G. Proc. Natl. Acad. Sci. U.S. 
A. 90:4518-4522(1993).[ 4] Slupphaug G., Markussen F-H.. Olsen L.C., Aasland R.. Aarsaether N.. Bakke O., Krokan 
H.E., Helland D.E. Nucleic Acids Res. 21 :2579-2584(1993).[ 5] Sawa R.. McAuley-Hecht K., Brown T.. Peart L. Nature 
373:487^93(1995). [ 6] Mol CD., Arvai A.S., Slupphaug G.. Kavli B., Alseth I.. Krohan H.E.. Tainer J.A. Cell 80:869-878 
(1995).{ 7] Muller S.J., Caradonna S. Biochim. Biophys. Acta 1088:197-207(1 991 ).[ 8] Meyer-Siegler K., Mauro D.J., 
35 Seal G., Wurzer J., Deriel J.K.. Sirover M.A. Proc. Natl, Acad. Sci. U.S.A. 88:8460-8464(1 991 ).[ 9] Muller S.J.. Cara- 
donna S. J. Biol. Chem. 268:1310-1319(1993),[10] Barnes D.E., LindahlT., Sedgwick B. Curr. Opin. Cell Biol. 5:424-433 
(1993). 

[1584] 687. Uncharacterized protein family UPF0001 signature 

The following uncharacterized proteins have been shown [1] to share regions ofsimilarrties: 

40 

Yeast chromosome II hypothetical protein YBL036c. - Caenorhabditis elegans hypothetical protein F09E5.8. - 
Bacillus subtilis hypothetical protein yImE. - Escherichia coli hypothetical protein yggS and HI0090, the con-espond- 
ing Haemophilus influenzae protein. - Helicobacter pylori hypothetical protein HP0395. - Mycobacterium tubercu- 
losis hypothetical protein MtCY270.20. - Synechocystis strain PCC 6803 hypothetical protein slr0556. - A Pseu- 
ds domonas aeruginosa hypothetical protein in pilT 5'region. - A Vibrio alginolyticus hypothetical protein in pilT 5're- 
gion. These are proteins of from 25 to 30 Kd which contain a number of conserved regions. The best conserved 
region which is located in the first third of these proteins has been selected as a signature pattern. 

Consensus pattem: [FW]-H-[FM]-[IV]-G-x-[LIV]-Q-x-[NKR]-K-x(3)-[LIV] 
50 1 1] Bairoch A., Rudd K.E. Unpublished observations (1996). 

[1 585] 688. Uncharacterized protein family UPF0003 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Escherichia coli protein 
aefA. - Escherichia coli hypothetical protein yggB. - Escherichia coli hypothetical protein yjeP and HI0195.1, the cor- 
responding Haemophilus influenzae protein. - Escherichia coli hypothetical protein ynal. - Bacillus subtilis hypothetical 
55 protein yhdY. - Helicobacter pylori hypothetical protein HP0415. - Synechocystis strain PCC 6803 hypothetical protein 
slr0639. - Archaeoglobus fulgidus hypothetical protein AF1546. - Methanococcus jannaschii hypothetical protein 
MJ0170. - Methanococcus jannaschii hypothetical protein MJ1143.The size of these proteins range from 30 to 120 Kd. 
They all contain a number of transmembrane regions. The best conserved region which is located in and just after the 
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last potential transmembrane region has been selected as a signature pattern,. .wfqtai x 

Cor,3arr3JS pattern: G-[S-nF]-V-v(7)-[i lVMl-x{6)-rLIVMFl-x(3)-[DQl-x(3)-[LIV]- x-[LIV]-P-N-x(2)-[LIVMF]-[LIVFSTA]-x 

(5)-N 

[ 1] Bairoch A. Unpublished observations (1997). 

[1586] 689. Uncharacterized protein family UPF0004 signature c.^^ri^h.a r«i, hNmf>. 

The following uncharacterized proteins have been shown [1] to share reg.ons of s.m,lar*es: 
thetical protein yliG.-Escherichia coll hypothetical protein yleAandHI0019 the corresp^ 

orntein - Bacillus subtilis hypothetical protein yqeV. - Helicobacter pylon hypothetical protein HP0269. - Hencobacier 

S hyp^^eS. J2 ei 2p0285. - Mycoplasma iowae hypothetical protein in 1 6S RNA 5"region. - My-ba«.enum 

teCe hVDOthetlcal protein B2235 C2 195. - Pseudomonas aeruginosa hypothetical protein in hemL 3 region. - Syn- 

echoS S^PCC 6803 hypotheiical protein slr0082. - Synechocystis strain PCC 6803 hypothetical pro e n 

:SrMlthanococcus jannaschli hypothetical protein MJ0865. - M^t^^^"^*^- f .^-J;;' ^^^^^^ 

MJ0867 - Caenorhabditis elegans hypothetical protein F25B5.5.The size of these proteins range from 47 to 61 Kd. 

They SntaTn six conserved cysteines' three of which are clustered in a region that can be used as asignature pattern. 

Consensus pattern: [LlVM]-x-[LIVMT|-x(2)-G-C-x(3)-C-[STANl-[FY]-C-x-ILIVMl-x(4)-G 

[1] Bairoch A. Unpublished observations (1997). 

ri5871 690 Uncharacterized protein family UPF0005 signature 

?he SllowS pSs seems to be evolutionary related [1]: - Mammalian protein TEGT (Testis Enhanced Gene "Ran- 
Icriott Serichia coli hypothetical protein yocA and HI0044, the con-esponding Haemophilus influenzae protein. - 
TTL^rZu6^^r^sTen.,^nos'on^om of yccA. These are proteins of about 25 Kd J -"l --J° ^^^^^^^^ 
seven transmembranedomains. A signature pattern that corresponds to a rogron that starts with the beginning of the 

third transmembrane domain and ends in the middle of the fourth one "^^^^^^ff" "l^^^^^ x(2)-A-x(3)-T-A- 
Consensus pattern: G-[LIVM](2)-[SA]-x(5,8)-G-x(2)-[LIVM]-G-P-x-L-x(4)-[SAG]-x(4,6)-[LIVM](2)-x(2)-A-x(3) T A 

[1] vSier L. Marynen P.. Szpirer J., Levan G.. Guenther E. Genomics 28:301-304(1995). 
rmsai 691 Uncharacterized protein family UPF0006 signatures 

The ?ollo!mg uJchaSrized Jroteins have been shown [1] to share regions of similarities: - Yeast chromosorne II 
Jy J^VhS protein YBL055C. - Escherichia coli hypothetical protein ycfH and HI0454. "'^^^"'^'"S "^^^^^^^^^^ 
influenzae protein. - Escherichia coli hypothetical protein yigW. - Escherichia co . ^P^*^*'^' P"'*^'"/^!""^^^ 
he corresponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical protein y^D. - Haemophte influ^ 
enzarhypSiSi protein Hh664. - Mycoplasma genitalium hypothetical protein MG009. These are P^otems of frorn 
S to 47 Kd which a)ntain a number of conserved regions. They can be picked up in the database by the following 
patterns. 

Consensus pattern: [UVMFY](2)-D-lSTA]-H-x-H-[LIVMF]-[DN 

Consensus pattern: P-[LIVMl-x-ILIVM]-H-x-R-x-|TA]-x-[DE „„.^,^.p„ o 

Consensus pattern: [LVSAHUVAl-x(2)-[LIVMHPSl-x(3)-L-ILIVMHUVMSl-E-T- D-x-P 
[ 1] Bairoch A., Rudd K.E. Unpublished observations (1995). 

[15891 692. Uncharacterized protein family UPF0007 signature ^ , ,»i„ v^kd =„h HinR79 

The following proteins seems fo be evolutionary related [1]: - Escherichia col. hypothetical P~'f'"^^^^"i"'°^^^J- 
the corresponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical protein yacM - Mycobactenum tu- 
ber.SoS hyp^hetiil protein MtCY06G11.29c. - Synechocystis strain PCC 6803 hypothetica P^^'" s'^^^- "^^ 
S^obacte?Spsulatus hypothetical protein in nifR3 S'region. Except for the Rhodobacter protein wh.ch contains a 
S^maVt^ension. all these proteins have from 225 to 236 amino acids. They are hydroph.lic proteins that can be 
picked up in the database by the following pattern. 
Consensus pattern: V-L-[IV]-H-D-[GA]-A-R 
[ 1] Bairoch A. Unpublished observations (1997). 

[15901 693. Uncharacterized protein family UPF001 5 signature v-.»=» r-hmmncnmo ii 

The following uncharacterized proteins have been shown [1] to share reg.ons of s..T,ilanties: - Ye^J^^hron^some^^^ 
hvMtt^^etical protein YBR002c. - Yeast chromosome XIII hypothetical protein YMRIOIc. - Eschenchia coh hypothetKal 
Sn S an^HI0920. the corresponding Haemophilus influenzae protein. - Helicobacter pylori ^^VP^t'^^*'*'-' P™^!"" 
5p1^21 MyMbacterium leprae hypothetical protein B1937_F2_65. - A Corynebactenum glutamicum hypothetical 
ostein in aroF^-S^ion - A Itreptomyces fradiae hypothetical protein in transposon Tn4556. - Synechocyst.s stra.n 
pS6803 SLmSprotein s!o505. - Methanococcus jannaschii hypothetical protein MJ1372.These are proteins 
of about 26 toCKd whose central region is well conserved. They can be picked up in the database by the following 

ConsTnsus pattern: [Dq-[LIVMF](3)-R-T-ISG]-G-x(2)-R-x-S-x-[FY]-[LIVM](2)-W-Q- 

1 1] Wolfe K.H.. Lohan A.J.E. Yeast 10:S41-S46(1994). 

[1 5911 694. Uncharacterized protein family UPF001 6 signature 
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The following uncharacterized proteins have been shown (1] to share regions of similarities: - Yeast hypothetical protein 
YBR187W. - Fission yeast hypothetical protein SpAC17G8.08c. - Mouse protein pFT27. - Synechocystis strain PCC 
6803 hypothetical protein S1I0615. These are hydrophobic proteins of 200 to 320 amino acids that seem to contain six 
or seven transmembrane domains. A conserved region which seems, in the eukaryotic proteins of this family, to directly 
5 follow the second transmembrane domain has been selected as a signature pattern. 
Consensus pattern: E-[LIVM].G-D-K-T-F-(LIVMF](2)-A- 
[ 1] Balroch A. Unpublished observations (1996). 
[1592] 695. Uncharacterized protein family UPF0021 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Yeast chromosome VII 
10 hypothetical protein YGL211w. - Dictyostelium discoldeum protein veg136. - Methanococcus jannaschii hypothetical 
proteins MJ1157 and MJ1478.These are proteins of from 300 to 36o residues. They can be picked up in thedatabase 
by the following pattem which is located in their N-terminalsection. Consensus pattern: C-K-x(2)-F-x(4)-E-x(22.23)-S- 
G-G-K-D 

[ 1] Balroch A. Unpublished observations (1997). 

15 [1593] 696. Uncharacterized protein family UPF0023 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Mouse protein 22A3. - 
Yeast chromosome XII hypothetical protein YLR022c. - Caenorhabditis elegans hypothetical protein W06E11 .4. - Meth- 
anococcus jannaschii hypothetical protein M J0592.These are hydrophilic proteins of about 30 Kd. They can be picked 
up in the database by the following pattem. 

20 [1594] Consensus pattem: D-x-D-E-[LIV]-L-x(4)-V-F-x(3)-S-K-G- 
[1595] [1] Balroch A. Unpublished observations (1997). 

[1596] 697. Uncharacterized protein family UPF0024 signature. The following uncharacterized proteins have been 
shown [1] to share regions of similarities: - Escherichia coli hypothetical protein ygbO and HI0701, the corresponding 
Haemophilus influenzae protein. - Helicobacter pylori hypothetical protein HP0926. - Yeast chromosome XV hypothet- 
25 ical protein YOR243c. - Caenorhabditis elegans hypothetical protein B0024. 11 . - Methanococcus jannaschii hypothet- 
ical proteins MJ0588 and MJ1364.These are hydrophilic proteins of from 39 to 77 Kd. They can be picked up in the 
database by the following pattern. 

[1 597] Consensus pattem: G-x-K-D-[KR]-x-A-[LV]-T-x-Q-x-[LIVF]-[SGC]- 
[ 1] Balroch A. Unpublished observations (1997). 

30 [1598] 698. Uncharacterized protein family UPF0025 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Escherichia coll hypo- 
thetical protein yfcE. - Bacillus subtilis hypothetical protein ysnB. - Mycoplasma genitalium and pneumoniae hypothet- 
ical protein MG207. - Methanococcus jannaschii hypothetical proteins MJ0623 and MJ0936. These are hydrophilic 
proteins of about 20 Kd. They can be picked up in thedatabase by the following pattern. 

35 Consensus pattern: D-V-[LIV]-x(2)-G-H-[ST|-H-x(12)-[LIVMFl-N-P-G 
[ 1] Balroch A. Unpublished observations (1997). 
[1599] 699. Uncharacterized protein family UPF0029 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Yeast chromosome III 
hypothetical protein YCR59c. - Yeast chromosome IV hypothetical protein YDL177C. - Escherichia coli hypothetical 
40 protein ylgZ and HI0722. the corresponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical protein 
yvyE. - A Thermus aquaticus hypothetical protein in pol 5'region. These proteins can be picked up in the database by 
the following pattem. 

Consensus pattem: G-x(2)-[LIVM](2)-x(2)-[LIVM]-x(4)-[LIVM]-x(5)-[LIVM](2)-x- R-[FYWJ(2)-G-G-x(2)-[LIVM]-G 
1 1] Koonin E.V.. Bork P. Sander C. EMBO J. 13:493-503(1994). 

45 [1600] 700. Uncharacterized protein family UPF0030 signature 

The following uncharacterized proteins have been shown [1] to be highly similar: - Yeast chromosome VI hypothetical 
protein YFL060c. - Yeast chromosome XIII hypothetical protein YMR095c. - Yeast chromosome XIV hypothetical protein 
YNL334C. - Bacillus subtilis hypothetical protein yaaE. - Haemophilus influenzae hypothetical protein H1 1648. - Meth- 
anococcus jannaschii hypothetical protein MJ1 661 .These are hydrophilic proteins of about 19 to 25 Kd. They can be 

50 picked up inthe database by the following pattern. 

Consensus pattem: (GA]-L-I-[LIV1-P-G-G-E-S-T-[STA] 

[ 1] Balroch A. Unpublished observations (1997). 

[1601] 701. Uncharacterized protein family UPF0032 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Escherichia coll hypo- 
55 thetical protein yigU and HI0188, the corresponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical 
protein ycbT. - Mycobacterium tuberculosis hypothetical protein MtCY49.33c and U2126A. the corresponding Myco- 
bacterium leprae protein. - Synechocystis strain PCC 6803 hypothetical protein 8110194. - Odontella sinensis and Por- 
phyra purpurea chlroplast hypothetical protein ycf43.These proteins have from 245 to 317 amino acids and seem to 
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contain atleastsixorseventransmeaibranerBgions-Aconserved region located in the cen^^^ 

I u>.^«« rir^Mair^naH 3 .Qinnatur© oattern.. . . _ • u 

C;;:e^^su;;a7em' W^^^^ IL'V'^l -1^1 •'^'""^ """""" 

lished observations (1996). 

[1 602] 702. Uncharacterized protein family UPF0034 signature „i„i,ari*iM- - Escherichia coli hypo- 

R G. Mol. Microbiol. 8:903-914(1993). 

[16031 703. Uncharacterized protein family UPF0038 Signature Escherichia coll hypo- 

vations(1997). 

pattern. rNHl-FL M-xO 4)-G-x-rLIV]-C-[LIV]-x- [LIV] [C is the active site residue] 

meTcrnTufK^^^^^ Bio^lem'. S^i. 15:ie5-198(1990).l 2, .enjj S Se^ert 

iau??H :P Biochim. Biophys. Acta 1089:127-139(1991^ 

well conserved region which is located in the central section of the protein. 

[ 1] Garey J.R.. Labbe-Bois R.. Chelstowska A.. Rytka J., Hamson L.. Kushner J.. Laooe k tur. j, 
1011-1016(1992). 
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is involved in both ubiquinone and menaquinone biosynthesis and which catalyzes the S-adenosylmethionine depend- 
ent methylation of 2-polyprenyl-6-methoxy-1 ,4-benzoquinol into 2-polyprenyl-3- methyl-6-methoxy-1 ,4-benzoquinol 
and of demethylmenaquinol into menaquinol. - Yeast COQ5, a ubiquinone biosynthesis methlytransferase. - Bacillus 
subtllis spore genmination protein C2 (gene: gercB orgerC2), a probable menaquinone biosynthesis methlytransferase. 

5 - Lactococcus lactis gerC2 homolog. - Caenorhabditis elegans hypothetical protein ZK652.9. - Leishmania donovani 
amastigote-specific protein A41. These are hydrophilic proteins of about 30 Kd (except for ZK652.9 which is 65Kd). 
They can be picked up in the database by the following patterns. 
Consensus pattem: Y-D-x.M.N-x(2)-[LIVM]-S-x(3)-H-x(2)-W 
Consensus pattem: R-V-[LIVM]-K-[PV]-G-G-x-ILIVMF]-x(2)-[LIVM]-E-x-S 

10 1 1] Lee P.T. Hsu A.Y.. Ha H.T.. Clarke C.R J. Bacteriol. 179:1748-1754(1997). 
[1607] 707. Uricase signature 

Uricase (urate oxidase) [1] is the peroxisomal enzyme responsible for the degradation of urate into allantoin. Some 
species, like primates and birds, have lost the gene for uricase and are therefore unable to degradeurate. Uricase is 
a protein of 300 to 400 amino acids. A highly consen/ed region located in the central part of the sequence has been 
IS used as a signature pattem. 

Consensus pattern: [LV]-x-[LV]-[LIV]-K-[STV]-[ST]-x-ISN]-x-F-x{2)-[FY]-x{4)- [FY]-x(2)-L-x(5)-R 
[ 1] Motojima K., Kanaya S.. Goto S. J. Biol. Chem. 263:16677-16681(1988). 
[1608] 708. Universal stress protein family (Usp) 

[1 609] By a wide range of stress conditions members of the Usp family are predicted to be related to the MADS-box 
20 proteins transcript_fact and bind to DNA [2]. Number of members: 39 

[1] Expression and role of the universal stress protein, UspA, of Escherichia coll during growth arrest. Nystrom X 
Neidhardt PC; Mol Microbiol 1994; 11:537-544, 

[2] Sequence analysis of eukaryotic developmental proteins: ancient and novel domains. Mushegian AR, Koonin 
25 EV; Genetics 1996; 144:817-828. 

[1610] 709. Ubiquitin domain signature and profile 

Ubiquitin [1,2,3] is a protein of seventy six amino acid residues, found in all eukaryotic cells and whose sequence is 
extremely well conserved from protozoan to vertebrates. It plays a key role in a variety of cellular processes, such as 

30 ATP-dependent selective degradation of cellular proteins,maintenance of chromatin structure, regulation of gene ex- 
pression, stress response and ribosome biogenesis. In most species, there are many genes coding for ubiquitin. How- 
ever they can be classified into two classes. The first class produces polyubiquitin molecules consisting of exact head 
to tail repeats of ubiquitin. The number of repeats is variable (up to twelve in a Xenopus gene). In the majority of 
polyubiquitin precursors, there is a final amino-acid after the last repeat. The second class of genes produces precursor 

35 proteins consisting of a single copy of ubiquitin fused to a C-terminal extension protein (CEP). There are two types of 
CEP proteins and both seem to be ribosomal proteins. Ubiquitin is a globular protein, the last four C-terminal residues 
(Leu-Arg- Gly-Gly) extending from the compact structure to form a 'tail*, important for its function. The latter is mediated 
by the covalent conjugation of ubiquitin to target proteins, by an isopeptide linkage between the C-terminal glycine and 
the epsilon amino group of lysine residues in the target proteins. There are a number of proteins which are evolutionary 

^ related to ubiquitin: - Ubiquitin-like proteins from baculoviruses as well as in some strains of bovine viral diarrhea viruses 
(BVDV). These proteins are highly similar to their eukaryotic counterparts. - Mammalian protein GDX [4]. GDX is com- 
posed of two domains, a N-terminal ubiquitin-like domain of 74 residues and a C-terminaf domain of 83 residues with 
some similarity with the thyroglobulin hormonogenic site. - Mammalian protein FAU [5]. FAU is a fusion protein which 
consist of a N-terminal ubiquitin-like protein of 74 residues fused to ribosomal protein S30. - Mouse protein NEDD-8 

45 [6], a ubiquitin-like protein of 81 residues. - Human protein BAT3, a large fusion protein of 1132 residues that contains 
a N-terminal ubiquitin-like domain. - Caenorhabditis elegans protein ubl-1 [7]. Ubl-1 is a fusion protein which consist 
of a N-terminal ubiquitin-like protein of 70 residues fused to ribosomal protein S27A. - Yeast DNA repair protein RAD23 
[8]. RAD23 contains a N-terminal domain that seems to be distantly, yet significantly, related to ubiquitin. - Mammalian 
RAD23-related proteins RAD23A and RAD23B. - Mammalian BCL-2 binding athanogene-1 (BAG-1 ). BAG-1 is a protein 

50 of 274 residues that contains a central ubiquitin-like domain. - Human spliceosome associated protein 114 (SAP 114 
or SF3A120). - Yeast protein DSK2. a protein involved in spindle pole body duplication and which contains a N-terminal 
ubiquitin-like domain. - Human protein CKAP1/TFCB, Schizosaccharomyces pombe protein alp11 and Caenorhabditis 
elegans hypothetical protein F53F4.3. These proteins contain a N-terminal ubiquitin domain and a C-terminal CAP- 
Gly domain. - Schizosaccharomyces pombe hypothetical protein SpAC26A3.16. This protein contains a N-terminal 

55 ubiquitin domain. - Yeast protein SMT3. - Human ubiquitin-like proteins SMT3A and SMT3B. - Human ubiquitin-like 
protein SMT3C (also known as PtCI; Ubl1; Sumo-1; Gmp-1 or Sentrin). This protein is involved in targeting ranGAPI 
to the nuclear pore complex protein ranBP2. - SMT3-like proteins in plants and Caenorhabditis elegans. To identify 
ubiquitin and related proteins, a pattem has been developed based on conserved positions in the central section of 
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LI. Bio/rechnology 8:209-215(1990) [ 3]^^^^^^^^^ 

M.. Tribioli C. Toniolo D. Genomics 7:453-457(1990).[ 303.399,1 993^ d., Candido E. 

[ 61 Kumar S.. Yoshida Y, Noda M. Biochem. Biophys. Res. Commun. ^J^-^^.^;;^^^^^ 

P J Btol Ch^m. 268:19545-19551(1993).I8] Melnick L.. Shemian F. J. Mol. B,ol. 233.372-388(1993). 

[1611] 710. VHS domain 07 
[1 612] Domain present in VPS-27. Hrs and STAIVI. Number of members. 27 

[1613] 711 . VInculin family signatures ,„ .u. attachment of the actin-based microfilaments to the 

Vinculin [1] is a eukaryotic protein that seems to be '"Jf^^j"!^^^ plaques. In addition to actin. 

plasma membrane. Vinculinis located atthe cytoplasmic s^je of fo«jl^^^^ 

vinculin interacts with other structural proteins such as f '"/"^^^'p!^!^^^^^ of about 90 Kd separated 

(about a 1000 residues)^ Structural^ the P^^^j" --^^^^^^^^ The centra, part of the 

from a basic C-terminal domain of about 25 Kd by a P^°''"^™^^9 caenorhabditis elegans) of repeats of a 110 
N-temnlnal domain consists of avariable number (3 - verte^ates^ fhrc^oplalTc domain of avariety of cadherins. 
amino acids domain. Catenins 12] are P~*«'"^^f/^°"„^^^^^^^ filament network, and which 

The association of catenins to cadhenns produces '^^^^^^ different types of catenins seem to 

seems to be of primary importance for cadhenns «=«"-f*!'^«^'°" P^.^-'J^".^^^^^^^^^ related to vinculin. 

exist: alpha, beta, and gamma. Alpha-caten.ns are proteins about 100 wj'cha^^^^^ ry 
Intemi of their structure the most significant differences are the f ^"<;^; '"^f^*^^ first pattern is 

Of the proiine-rich segment. Two signature P^^-^^^Telt a X^^^^^^^ 

rrrprrtKi^Mu^ ^-^-^^-^^^^^^ 

l^\S'l:?i;iM'oti..Cytoske.eton 16:1-60 

M Kemler R. Proc. Natl. Acad. Sci. U.S.A. 88.9156-9160(1991). 

11614] 712. (Vitellogenin N) L*P0P^<f t,g,yeeride transfer protein and apolipoprotein B- 
'^S'i^^XZ:^^^ 'in,:^'^^^-^ - LVIn Chain ^ lipov^eHin. that 

jrrm™urr.^rp":^^^^^^ 

naszak U Stmcture 1998;6:895-909. 

[1617] 713. (VMSA) Major surface antigen from hepadnavirus 

ri618] 714. ssDNA binding protein (Viral DNA bp) 

This protein is found in herpesviruses and is needed for replication. 

[1619] 715. (Votage CLC) Voltage ^^'^^^^^^i;';^";;^^^^^^^ helices. Each protein fomis a single pore. It 

Li^in^rtrrereir^^^^^^^ 



1996;47:993-998. 



Bork P, Rohde K; 
Biochem J 1991;279:908-911. 

1. RUGGERI, Z.M. and WARE. J. 
von Willebrand factor. 
; FASEB J. 7 308-316(1993). 
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teins. 

MATRIX 13 297-306 (1993). 

3. PERKINS. S J.. SMITH. K.F.. WILLIAMS. S.C.. MARIS. Rl.. CHAPMAN. D. and SIM. R.B. 

5 The secondary structure of the von Willebrand factor type A domain in factor B of human complement by Fourier 

transform infrared spectroscopy. 

Its occurrence in collagen types VI, VII, XII and XIV. the integrins and other proteins by averaged structure pre- 
dictions. 

J.MOL.BIOL. 238 104-119 (1994). 

10 

4. BORK. R and ROHDE. K. 

More von Willebrand factor type A domains? Sequence similarities with malaria thrombospondin-related anony- 
mous protein, dihydropyridine-sensitive calcium channel and inter-alpha-trypsin inhibitor. 
BIOCHEM.J. 279 908-910 (1991). 

15 

5. EDWARDS, YJ.K. and PERKINS. S.J. 

The protein fold of the von Willebrand factor type A domain is predicted to be similar to the open twisted beta- 
sheet flanked by alpha-helices found in human ras-p21. 
FEBS LETT. 358 283-286 (1995). 

20 

6. LEE, J.O., RIEU, R. ARNAOUT. M.A. and LIDDINGTON, R. 

Crystal structure of the A domain from the alpha subunit of integrin CR3 (CD11b/CD18). 
CELL 80 631-638(1995). 

25 7. QU, A. and LEAHY. D.J. 

Crystal structure of the l-domain from the CD11a/CD18 (LFA-1, alpha L beta 2) integrin. 
PROC.NATL.ACAD.SCI.USA92 10277-10281 (1995). 

[1 622] The von Willebrand factor is a large multimeric glycoprotein found in blood plasma. Mutant forms are involved 
30 in the aetiology of bleeding disorders [1 ]. In von Willebrand factor, the type A domain (vWF) is the prototype for a protein 
superfamily. The vWF domain is found in various plasma proteins: complement factors B, C2. CR3 and CR4; the 
integrins (l-domains); collagen types VI, VII, XII and XIV; and other extracellular proteins [2-4]. Proteins that incorporate 
vWF domains participate in numerous biological events (e.g.. cell adhesion, migration, homing, pattern formation, and 
signal transduction), involving interaction with a large array of iigands [2]. Secondary structure prediction from 75 
35 aligned vWF sequences has revealed a largely alternating sequence of alpha-helices and beta-strands [3]. Fold rec- 
ognition algorithms were used to score sequence compatibility with a library of known structures: the vWF domain fold 
was predicted to be a doubly-wound, open, twisted beta-sheet flanked by alpha-helices [5]. 3D structures have been 
determined for the l-domains of integrins CDIIb (with bound magnesium) [6] and CDIIa (with bound manganese) [7]. 
The domain adopts a classic alpha/beta Rossmann fold and contains an unusual metal ion coordination site at its 
40 surface. It has been suggested that this site represents a general metal ion-dependent adhesion site (MIDAS) for 
binding protein Iigands [6]. The residues constituting the MIDAS motif in the CDIIb and CDIIa l-domains are com- 
pletely conserved, but the manner in which the metal ion is coordinated differs slightly [7], 

[1623] VWFADOMAIN is a 3-element fingerprint that provides a signature for the vWF domain superfamily. The 
fingerprint was derived from an initial alignment of 14 sequences. Motif 1 includes the first beta-strand and 3 conserved 

45 residues involved in metal ion coordination in l-domains (Asp and 2 serines in positions 8, 10 and 12, respectively): 
motif 2 spans strands beta-2 and beta-2'; and motif 3 encodes beta-strand 3 and a conserved Asp (in position 7), which 
coordinates the metal ion [6,7]. Three iterations on OWL27.0 were required to reach convergence, at which point a 
true set comprising 56 sequences was identified. Numerous partial matches were also found. 
[1624] 717. (WD40) WD domain. G-beta repeat 

50 The ancient regulatory-protein family of WD-repeat proteins. 
Neer EJ, Schmidt CJ. Nambudripad R, Smith TF; 
Nature 1994;371:297-300. 

Beta-transducin (G-beta) is one of the three subunits (alpha, beta, and gamma) of the guanine nucleotide-binding 
proteins (G proteins) which act as intermediaries in the transduction of signals generated by transmembrane receptors 
55 [^], The alpha subunit binds to and hydrolyzes GTP; the functions of the beta and gamma subunits are less clear but 
they seem to be required for the replacement of GDP by GTP as well as for membrane anchoring and receptor rec- 
ognition. 

[1625] In higher eukaryotes G-beta exists as a small multigene family of highly conserved proteins of about 340 
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shown [El, 2,3.4,5] to exist In a number of other proteins listed below: 

- Yeast STE4 a component of the pheromone response pathway. STE4 is a G-beta like protein that associates with 

hPta Dfoteins It may also function In signal transduction. ^ ^ , * •« -lo q 

. Sramydomonas rJnhardtii gblp. This protein is most probably the homolog of vertebrate prote.n 12.3. 

- Human LIS1 , a neuronal protein involved in type-1 lissencephaly [E2] ^^^^^ that reversiblv as- 
. Mammalian coatomer beta" subunit (beta'-COP). a component of a c^^ol.c Pr^^'" '^^^^''''^ 

sociates with Golgl membranes to form vesicles that mediate blosynthetic prote.n transport. 

- Yeast CDC4. essential for initiation of DNA replication and separation of the spindle pole bodies to form the poles 

. i^st?S!a;J;tei^requiredfbr^^«.microtubule^^^ 

and chromosome separation. ^, ♦ dma 

- Yeast PWP1 , a protein of unknown function. 

- Yeast SKIS, a protein essential for controlling the propagation of double^tranded RNA^ nucleolar RNA 
vlT^i SOF a oroteln required for ribosomal RNA processing which associates with U3 small nucleolar RNA 

: ?Sp1 (alKow^^^^^^^ or SFL2 or CYC9), a protein which has been Implicated in dTMP uptake, cat- 

abolite repression, mating sterility, and many other phenotypes. 
Yeast YCR57C, an ORF of unknown function from chromosome III. 

- Yeast YCR72c! an ORF of unknown function from chromosome III. 

- Slime mold coronin, an actin-binding protein. 

- Slime mold AAC3, a developmentally regulated protein of unknown function. 

- Drosophila protein Groucho (formerly known as E(spl); 'enhancer of split'), a protein involved In neurogenesis and 
that seems to interact with the Notch and Delta proteins. 

- Drosophila TAF-II-80, a protein that is tightly associated with TFIID. 

. consensus pattern: [L1VMSTAC]-[L1VMFYWSTAGC]-[L1MSTAGHUVMSTAGC]-x(2H^^^^ 
x(2HLIVMWSTACl-x-[UVMFSTAGl-W-[DEN]-lLIVMFSTAGCNl 
[ 1] Gilman A.G. 

Annu. Rev. Biochem. 56:615-649(1987). 
[ 2] Duronio R.J., Gordon J.I., BoguskI M.S. 
Proteins 13:41-56(1992). 
[ 3] van der Voorn L., Ploegh H.L. 
FEBS Lett. 307:131 134(1992). 

[41 Neer E.J., Schmidt C.J., Nambudripad R., Smith T.F. 
Nature 371:297-300(1994). 

[ 5] Smith T.F., Gaiatzes C.G.. Saxena K., Neer E.J. 
Biochemistry In Press(1998). 

RNA synthetases. This domain is present one to six times in the following enzymes: 
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Mammalian multifunctional aminoacyMRNA synthetase. The domain Is present three times in a region that sepa- 
rates the N-terminal glutamyl-tRNA synthetase domain from the C-terminal prolyl-tRNA synthetase domain. 
Drosophila multifunctional aminoacyl-tRNA synthetase. The domain Is present six times in the intercata lytic region. 
Mammalian tryptophanyl-tRNA synthetase. The domain Is found at the N-terminal extremity. 
5 - Mammalian, insect, nematode and plant glycyl-tRNA synthetase. The domain is found at the N-terminal extremity 
[21. 

Mammalian histldyMRNA synthetase. The domain is found at the N-tenmlnal extremity, 

[1629] This domain, which is called WHEP-TRS, could contain a central alpha-helical region and may play a role in 
10 the association of tRNA-synthetases into multienzyme complexes. 

[1630] A signature pattern based on the first 29 positions of the WHEP-Domain has been developed. 

- Consensus pattern: [QY]-G-[DNEA]-x-[LIV]-[KRJ-x(2)-K-x(2)-[KRNG]-[AS)-x(4)-[LIV]-[DENK]-x(2)-[IV]-x(2)-L-x 
(3)-K 

15 

[ 1) Cerini C, Kerian P., Astier M., Gratecos D., Miranda M.. Semerlva M. EMBO J. 10:4267-4277(1991). 
[ 2] Nada S., Chang RK., Dignam J.D. 
J. Biol. Chem. 268:7660-7667(1993). 

20 [1631] 719. (Worm family 8) Putative membrane protein 

Analysis of protein domain families in Caenorhabditis elegans. 
Sonnhammer EL. Durbin R; 
Genomics 1997;46:200-216. 

This family called family 8 in [1], may be a transmembrane protein 
25 The specific function of this protein Is unknown. 
[1632] 720. Xylose Isomerase 

Xylose Isomerase (EC 5.3.1.5) [1] Is an enzyme found In microorganisms which catalyzes the interconversion of D- 
xylose to D-xylulose. It can also isomerize D-ribose to D-ribulose and D-glucose to D-fiructose. Xylose Isomerase seems 
to require magnesium for its activity, while cobalt is necessary to stabilize the tetrameric structure of the enzyme. A 

30 number of residues are conserved in all known xylose isomerases. 

[1633] Xylose isomerase also exists in plants [2] where it is homodimeric and is manganese-dependent. 
[1634] Two signatures patterns for xylose isomerase have been developed. The first one is derived from a stretch 
of five conserved amino acids that Includes a glutamic acid residue known to be one of the four residues Involved in 
the binding of the magnesium Ion [3]; this pattem also includes a lysine residue which Is involved in the catalytic activity. 

35 The second pattem is derived from a conserved region in the N-terminal section of the enzyme that include an histidine 
residue which has been shown [4] to be Involved in the catalytic mechanism of the enzyme. 
Consensus pattern: (LI]-E-P-K-P-x(2)-P 
[E is a magnesium ligand] 
[K is an active site residue] 

40 

. Consensus pattem: [FL]-H-D-x-D-[LIV]-x-[PD]-x-(GDEl 

[H is an active site residue] 

45 [ 1] Dauter Z., Dauter M.. Hemker J.. Witzel H.. Wilson K.S. 

FEBS Lett. 247:1-8(1989). 

[ 2] Kristo P. A., Saarelainen R., Fagerstrom R.. Aho S., Korhola M. 
Eur. J. Blochem. 237:240-246(1996). 
[ 3] Henrick K., Collyer C.A., Blow D.M. 
so J. Mol. Biol. 208:129-157(1989). 

[ 4] Vangrysperre W.. Ampe C. Kersters-Hilderson H,. Tempst P. 
Biochem. J. 263:195-199(1989). 

[1635] 721. XPG protein signatures. Xeroderma pigmentosum (XP) [1] is a human autosomal recessive disease, 
55 characterized by a high incidence of sunlight-induced skin cancer. People's skin cells with this condition are hypersen- 
sitive to ultraviolet light, due to defects in the incision step of DNA excision repair. There are a minimum of seven 
genetic complementation groups involved In this pathway: XP-A to XP-G. The defect in XP-G can be corrected by a 
133 Kd nuclear protein called XPG (or XPGC) [2].XPG belongs to a family of proteins [2,3,4,5.6] that are composed 
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„f h«™nain subsets- - Subset 1 to which belongs XPG, RAD2 from budding yeast and radl3 from fission yeast. RAD2 
l!rrn!::i?n:LtS!S ^NA eidonuaJses [7.8]. XPG makes the 3-incision in human DNA nucleotide excisjon 
Repair [91 Subset 2. to which belongs mouse and human FEN-1. rad2 from fission yeast, ana kmu^, 
vea J FEN-lTa St ucture-specific endonuclease. In addition to the proteins listed in the above groups this famHy 
aio ndudes- Ftes on yeast exol. a 5-->3- double-stranded DNA exonuclease that could act in a P^ '^v.ay that corrals 
also inciuaes^ rission y . . . . -^^ probably the same function as exol. - Yeast DlN7.Se- 

toS a the N teSiinatextre^ity (N-region) and corresponds to the first 95 to 105 amino acds. The second region 
Sema r'JSionranS found toS^ards the C-terminus; it spans about 140 residues and contains a highly consented 
iroT27 aZo adds that includes a conserved pentapeptide (E^-[DE]-A-[QS]). It is possible that the a^nse d 
Sc esidu^are involved in the catalytic mechanism of DNA excision repair in XPG. The amino ^^^s linking he N- 
and I reaions are not conserved; indeed, they are largely absent from proteins belonging to the second subset. Two 
Z^T^ZSJe Zn developed for tLse proteins. The first corresponds to the centra, part of the N^egion. 
the second to part of the l-reglon and includes the putative catalytic core pentapeptide 
[1636] Consensus pattern: [Vll-[KREl-P-x-[FYlL]-V-F-D-G-x(2)-[PlL]-x-[LVC]-K- 

rSiroih A aSkLTG Nature 363:182-185{1993).[ 3] Carr A.M.. Sheldrick K.S., Murray J.M A "Hanthy R 
watts F Z Lehmann A.R. Nucleic Acids Res. 21: 1345-1 349(1 993).[ 4] Murray J.M., Tavassoli ^'-Hanthy R Shel- 
^ TSt' xZZZnn A R Carr A M Watts FZ Mol. Cell. Biol. 14:4878^888(1994 . 5] Han-ington J.J.. Lieber M.R. 
SesDev'?™^^^^^^^^ 

Prakash L., Prakash S. Nature 366:365-368(1993).[ 8] O'Donovan A.. Scherly D^aajson S-G^-^Wood R^a J. Bk, - 
Chem. 269:15965-15968(1994).[9] O'Donovan A.. Davies A.A.. Moggs J.G.. West S.C., Wood R.D. Nature 
432-435(1994). 

Uric uric acid-xanthine permease (gene uapA) from Aspergillus nidulans. 
Purine permease (gene uapC) from Aspergillus nidulans. 
Xanthine pemnease from Bacillus subtilis (gene pbuX). 

Uracil permease from Escherichia coli (gene uraA) [2] and Bacillus (gene pyrP). 
Hypothetical protein ycdG from Escherichia coli. 
Hypothetical protein ygfO from Escherichia coli. 
Hypothetical protein ygfU from Escherichia coli. 
Hypothetical protein yicE from Escherichia coli. 
Hypothetical protein yunJ from Bacillus subtilis. 
Hypothetical protein yunK from Bacillus subtilis. 

n fi-iQi Thev are oroteins of from 430 to 595 residues that seem to contain 12 transmembrane domains 

Vhe bL c^^^^^^^^^ corresponds w.h what seems to be the tenth transmembrane doma.n has been 

selected as a signature pattern. 

. Consensus pattern: [LIVMl-P-x-[PASIF]-V-[LlVM]-G-G-x(4)-[LIVM]-[FY]-[GSA]-x-[LIVM]-x(3)-G 

1 1] Dialllnas G.. Gorfinkiel L.. Arst G.. Cecchetto G.. Scazzocchio C. J. E»oL C^em 270:8610-8622(1995). 
1 2] Andersen P.S.. Frees D.. Fast R.. Mygind B. J. Bacteriol. 177:2008-2013(1995). 

VriS pSrw^i^sers™ ^ ^^^^-^ synthases (EC 4.2.1.70) [1, have been 

shown to share regions of similarities: 

. Escherichia coli and Haemophilus influenzae ribosomal large subunit pseudouridlne synthase A (gene rluA). It is 
rmaonsible for svnthesis of pseudouridlne from uracil-746 IN 23S rRNA. , r-^ tn^ 

. Esc'rSraSi L Haemophilus Influenzae ribosomal 'a'S-ubunJ pse^oundine sy^^^^^^^^ rluC). It is 

resDonsible for synthesis of pseudouridlne from uracil at positions 955. 2504 and 2580 in 23S rRNA. 

. ScherSla Si protein and homologs in other bacteria large subunit pseudouridlne synthase D (gene riuD). 
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Haemophilus influenzae hypothetical protein HI0042. 
Aquifex aeolicus hypothetical protein AQ_1758. 
Bacillus subtills hypothetical protein yhcT. 
Bacillus subtilis hypothetical protein yjbO. 
Bacillus subtilis hypothetical protein ylyB. 
Helicobacter pylori hypothetical protein HP0347. 
Helicobacter pylori hypothetical protein HP0745. 
Helicobacter pylori hypothetical protein HP0956. 
Mycoplasma genitalium hypothetical protein MG209. 
Mycoplasma genitalium hypothetical protein MG370. 
Synechocystis strain PCC 6803 hypothetical protein slr1592. 
Synechocystis strain PCC 6803 hypothetical protein slr1629. 
Yeast hypothetical protein YDL036c. 
Yeast hypothetical protein YGR1 69c. 
Fission yeast hypothetical protein SpAC18B11.02c. 
Caenorhabditis elegans hypothetical protein K07E8.7. 

[1641] These are proteins of from 21 to 50 Kd which contain a number of conserved regions in their central section. 
They can be picked up in the database by the following highly conserved pattern. 

- Consensus pattern: (LIVCA]-[NHYT|-R-[LI)-D-x(2)-T-[STA]-G-[LIVAGC]-[LIVMF](2)-[LIVMFGC]-[SGTACV] 

[1642] [ 1] Conrad J., Sun D., Englund N., Ofengand J. J. Biol. Chem. 273:18562-18566(1998). 

[1643] In addition, the following bacterial proteins, which seems to belong to a family of pseudouridine synthases 

(EC 4.2.1.70) [1] also have been shown to share regions of similarities: 

Escherichia coli and Haemophilus influenzae 16S pseudouridylate 516 synthase (EC 4.2.1 .70) (gene: rsuA). This 
enzyme is responsible for the fomiation of pseudouridine from uracil-516 in 16S ribosomal RNA. 
Escherichia coli hypothetical protein yciL and H1 11 99. the corresponding Haemophilus influenzae protein. 

Escherichia coli hypothetical protein yjbC. 

Escherichia coli hypothetical protein ymfC and HI0694. the corresponding Haemophilus influenzae protein. 

Aquifex aeolicus hypothetical protein AQ_554. 

Aquifex aeolicus hypothetical protein AQ_1464. 

Bacillus subtilis hypothetical protein ypuL. 

Bacillus subtilis hypothetical protein ytzF. 

Borrelia burgdorferi hypothetical protein BB0129. 

Helicobacter pylori hypothetical protein HP1459. 

Synechocystis strain PCC 6803 hypothetical protein slr0361. 

Synechocystis strain PCC 6803 hypothetical protein slr0612. 

[1644] These are proteins of from 25 to 40 Kd which contain a number of conserved regions in their central section. 
They can be picked up in the database by the following highly conserved pattern. 

- Consensus pattern: G-R-L-D-x(2)-[STA]-x-G-[LIVFA)-[LIVMF]{3)-[ST|-(DNST] 

[1645] [ 1J Wrzesinski J., Bakin A., Nurse K., Lane B.G., Ofengand J. Biochemistry 34:8904-8913(1995). 
[1646] 724. Zinc finger present in dystrophin, CBP/p300 
ZZ in dystrophin binds calmodulin 
Putative zinc finger; binding not yet shown. 
[1647] 725. Zinc carboxypeptidase 

There are a number of different types of zinc-dependent carboxypeptidases (EC 3.4.17.-) [1.2]. All these enzymes 
seem to be structurally and functionally related. The enzymes that belong to this family are listed below. 

Carboxypeptidase A1 (EC 3.4.1 7.1), a pancreatic digestive enzyme that can removes all C-terminal amino acids 
with the exception of Arg. Lys and Pro. 

Carboxypeptidase A2 (EC 3.4.17.15). a pancreatic digestive enzyme with a specificity similar to that of carbox- 
ypeptidase A1 . but with a preference for bulkier C-terminal residues. 

Carboxypeptidase B (EC 3.4. 1 7.2), also a pancreatic digestive enzyme, but that preferentially removes C4erminal 
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^H"^,l:r;«H=.=. M rpr. 3.4.17.3^ (also known as arginine carboxypeptidase), a plasma enzyme which protects 

■ t7e;;d;fr;m"p'otent vasoactive and inflammatory peptides containing C-terminal Arg or Lys ^sucn a. 
anaphylatoxir^s) which^^^^^^^^^ ^^^^^^ ^ carboxypeptidase E). an enzyme 

■ S2n s^^etlry^g anules ^^^^^^ islets, adrenal glLnd, pituitary and brain. This enzyme removes residual 
Se-SinafS^S Lys remaining after initial endoprotease cleavage during prohomnone processing. 

. Ca*oxypeptidase M (EC 3.4.17.12). a membrane bound Arg and Lys specrfic enzyme. 

It is ideally srtuated to act on peptide hormones at local tissue sites where it could cont«>l their activity before or after 
interaction with specific plasma membrane receptors. 

Mast cell carboxypeptidase (EC 3.4.17.1), an enzyme with a specificity to carboxypeptidase A. but found in the 
SSpS^lergSer^^^^ (Cpase SG) (EC 3.4.17.-) [3]. which combines the specifc^es of mam- 

'^Z^^S::Z^^£^^^^^^^^ T (EC 3.4.17.18) (CPT) 14). which also combines the specificities 

AEB^matS^^^^^^ 
of other transcriptional proteins. 
Yeast hypothetical protein YHR132c. 

ri6481 All of these enzymes bind an atom of zinc. Three conserved residues are implicated '^Jf ^^"^ 
atSJUhSiTnS a glutamic add Two signature patterns which contain these three z.nc..gands have been 

derived. 

. consensus pattern: lPK].x-[LIVMF^-x.[UVMFY].^^^^^^^ ^ 

- JonSnsu^^^^^^ H.[STAG]-x(3HUVME].x(^^^^^ [H is a zinc ligand] 

[ 1] Tan F.. Chan S.J., Steiner D.F., Schilling J.W., Skidgel R.A. 
J Biol. Chem. 264:13165-13170(1989). 

[ 2] Reynolds D.S.. Stevens R.L.. Gurley D.S.. Lane W.S.. Austen K.F., 
Serafin W.E. 

J. Biol. Chem. 264:20094-20099(1989). 

[ 3] Narahashi Y 

J. Biochem. 107:879-886(1990). 

[ 41 Teplyakov A., Polyakov K., Obmolova G.. Strokopytov B.. Kuranova I.. 
Ostennan A.L.. Grishin N.V.. Smulevitch S.V.. Zagnitko O.R. 
Galperina O.V., Matz M.V.. Stepanov V.M. 
Eur. J. Biochem. 208:281-288(1992). 
[ 5] He G.-R. Muise A., Li A.W.. Ro H.-S. 
Nature 378:92-96(1995). 

[ 6] Hourdou M.-L.. Guinand M.. Vacheron M.J.. Michel G.. Denoroy L.. 
Duez CM.. Englebert S.. Joris B.. Weber G.. Ghuysen J.-M. 
Biochem. J. 292:563-570(1993). 
[ 7] Rawlings N.D., Ban-ett A.J. 
Meth. Enzymol. 248:183-228(1995). 

[1 649] 726. Zinc finger. C2H2 type 
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tion factor TFIIIA. These domains have since been found in numerous nucleic acid-binding proteins, A zinc finger 
domain Is composed of 25 to 30 ami no-acid residues. There are two cysteine or histidine residues at both extremities 
of the domain, which are involved in the tetrahedral coordination of a zinc atom. It has been proposed that such a 
domain interacts with about five nucleotides. A schematic representation of a zinc finger domain is shown below: 
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[1651] Many classes of zinc fingers are characterized according to the number and positions of the histidine and 
cysteine residues involved in the zinc atom coordination. In the first class to be characterized, called C2H2, the first 
35 pair of zinc coordinating residues are cysteines, while the second pair are histtdines. A number of experimental reports 
have demonstrated the zinc-dependent DNA or RNA binding property of some members of this class. 
[1652] Some of the proteins known to include C2H2-type zinc fingers are listed below. The number of zinc finger 
regions found in each of these proteins are indicated between brackets; a symbol indicates that only partial sequence 
data is available and that additional finger domains may be present. 

40 

- Saccharomyces cerevisiae: ACE2 (3), ADR1 (2), AZF1 (4), FZF1 (5). MIG1 (2). MSN2 (2), MSN4 (2), RGM1 (2). 
RIM1 (3). RME1 (3). SFP1 (2), SSL1(1), STP1 (3), SWI5 (3), VAC1 (1) and ZMS1 (2). 
Emericella nidutans: brIA (2). creA (2). 

Drosophila: AEF-1 (4), Cf2 (7), ci-D (5), Disconnected (2), Escargot (5), Glass (5), Hunchback (6). Kruppel (5), 
45 Kruppel-H (4+), Odd-skipped (4), Odd-paired (4), Pep (3), Snail (5), Spalt-major (7), Serependity locus beta (6), 

delta (7), h-1 (8). Suppressor of hairy wing su(Hw) (12), Suppressor of variegation suvar(3)7 (5). Teashirt (3) and 
Tramtrack (2). 

Xenopus: transcription factor TFIIIA (9), p43 from RNP particle (9), Xfin (37 II), Xsna (5), gastrula XlcGF5.1 to 
XICGF71.1 (from 4+ to 11+). Oocyte XlcOF2 to XlcOF22 (from 7 to 12). 
50 - Mammalian: basonuclin (6), BCL-6/LAZ-3 (6), erythroid krueppeMike transcription factor (3), transcription factors 
Spl (3). Sp2 (3), Sp3 (3) and Sp(4) 3, transcriptional repressor YY1 (4), Wilms' tumor protein (4). EGR1/Krox24 

(3) , EGR2/Krox20 (3). EGR3/Pilot (3), EGR4/AT133 (4), Evi-1 (10), GLI1 (5), GLI2 (4+), GLI3 (3+), HIV-EP1/ZNF40 

(4) , HIV-EP2 (2), KR1 (9+), KR2 (9). KR3 (15+), KR4 (14+), KR5 (11+). HF.12 (6+), REX-1 (4), ZfX (13), ZfY (13), 
Zfp-35 (18), ZNF7 (15). ZNF8 (7). ZNF35 (10), ZNF42/MZF-1 (13). ZNF43 (22). ZNF46/Kup (2). ZNF76 (7). ZNF91 

55 (36).ZNF133(3). 

[1653] In addition to the conserved zinc ligand residues it has been shown [6] that a number of other positions are 
also important for the structural Integrity of the C2H2 zinc fingers. The best conserved position is found four residues 
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atter the second cysteine; it is generally an aromatic or aliphatic residue. 

- Consensus pattern: C.x(2,4).C-x(3)-[LIVMFYWCl-x(8)-H-x(3.5)-H [The two C's and two H's are zinc l.gandsj 



[ 1] Klug A., Rhodes D. 

Trends Biochem. Sci. 12:464-469(1987). 

[ 2] Evans R.M., Hollenberg S.M. 

Cell 52:1-3(1988). 

[ 3] Payre F., Vincent A. 

FEBS Lett. 234:245-250(1988). 

[ 4] Miller J., McLachlan A.D., Klug A. 

EMBO J. 4:1609-1614(1985). 

[5] Berg J.M. 

Proc. Natl. Acad. Sci. U.S.A. 85:99-102(1988). 

[6] Rosenfeld R.. Margalit H. 

J. Biomol. Struct. Dyn. 11:557-570(1993). 



[16541 727. Zinc finger. C3HC4 type (RING finger) en h /^oMoH r'^Hra 

A number of eukaryotic and viral proteins contain a conserved cysteine-rich domain of 40 to 60 residues (called C3HG4 
IcTnge^^^^^^^ [1] that binds two atoms of zinc, and is probably involved in mediating prote.n-pro em .n- 

^:ac^i^^^^^^^ The 3D stru'ctu!i oUhe zinc ligation system is unique to the m^^^^ ^Tc xn1<f3T^^^^^^^ oli^^^. 
brace" motif. The spacing of the cysteines in such a domain is C-x(2)-C-x(9 to 39)-C-x{1 to 3)-H-x(2 to 3)-C-x(2) C x 

^^^Is^'^^^ known to include the C3HC4 domain are listed below (references are only provided for 

recently determined sequences). 

. Mammalian V{D)J recombination activating protein (gene I^GI). RAG1 activates the rearrangement of immu- 

- Z;?e'SrRp';r:i:S:Sr^^^^^ that regu.ates gene expression directed by the promoter region of the 
interleul<in-2 receptor alpha chain or the LTR promoter regran of HIV-1 . .. ^ , „» o^^om 

- HurnJn rfp. Rfp Is a developmentally regulated protein that may function in male germ cell development Recom- 
bination of the N-terminal section of rfp with a protein tyrosine kinase produces the ret transforming P">^^"- 

- Human 52 Kd Ro/SS-A protein. A protein of unknown function from the Ro/SS-A -ibonucleoprote.r^mplex Sera 
from patients wrth systemic lupus erythematosus or primary Sjogren's syndrome often contain antibodies that react 
with the Ro proteins. 

Human histocompatibility locus protein RING1. . . . . 

- Human PML, a probable transcription factor. Chromosomal translocation of PML with retino.c receptor alpha ere- 
ates a fusion protein which is the cause of acute promyelocytic leukemia (APL). 

- Mammalian breast cancer type 1 susceptibility protein (BRCA1) [El]. 
Mammalian cbl proto-oncogene. 

: ss"cDK'is^rrr,cAK,a.s«.N,.a««M*T,.,p»«.«««.«^»>. 

recoonlzes and bind a specific DNA sequence. . • ♦ 

- M^fan peroxisome assembly faclor-1 (PAF-1) (PMP35). which is somewhat involved in the b-ogenes's of 
peroxisomes in humans, defects in PAF-1 are responsible for a fom, of Zellweger syndrome, an autosomal re- 
cessive disorder associated with peroxisomal deficiencies. 

- Human MAT1 protein, which interacts with the CDK7-cyclin H complex. 

- Human RING1 protein. 

- Xenopus XNF7 protein, a probable transcription factor. , .. , :„ 

. Trypanosoma protein ESAG-8 (T-LR), which may be involved in the postranscriptional regulation of genes m VSG 
expression sites or may interact with adenylate cyclase to regulate its activity. 

- Drosophila proteins Posterior Sex Combs (Psc) and Suppressor two of zeste {Su{z)2) The two proteir^ belong 
to the Polycomb group of genes needed to maintain the segment-specific repression of homeotic selector genes. 

- Drosophila^rotePn male-specific msl-2. a DNA-binding protein which is involved in X chromosome dosage com- 
pensation (the elevation of transcription of the male single X chromosome). 

- Arabidopsis thaliana protein GOP1 which is involved in the regulation of photomorphogenesis. 
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Fungal DNA repair proteins RAD5, RAD16, RAD18 and radS. 

Herpesviruses trans-acting transcriptional protein ICP0/1E11 0. This protein which has been characterized in many 
different herpesviruses is a trans-activator and/or -repressor of the expression of many viral and cellular promoters. 
Bacutovtruses protein CG30. 
5 . Baculoviruses major immediate early protein (PE-38). 

Baculoviruses immediate-early regulatory protein lE-N/IE-2, 

Caenorhabditis elegans hypothetical proteins F54G8.4, R05D3.4 and T02C1 .1 . 

Yeast hypothetical proteins YER1 1 6c and YKR017c. 

10 [1656] The central region of the domain was selected as a signature pattem for the C3HC4 finger. 

- Consensus pattem: C-x-H-x-{LIVMFYl-C-x(2)-C-[LIVMYA] 

[1657] [ 1] Borden K.L.B.. Freemont P.S. 
15 Curr. Opin. Struct. Biol. 6:395-401(1996). 

[1658] 728. Zinc finger C-x8-C-x5-C-x3-H type (and similar). 
[1659] 729. Zinc finger, CCHC class 

A family of CCHC zinc fingers, mostly from retroviral gag proteins (nucleocapsid). Prototype structure is from HIV. 
Also contains members involved in eukaryottc gene regulation, such as C. elegans GLH-1 . 
20 Structure is an 18-restdue zinc finger; no examples of indels in the alignment. 
[1660] 730. Zn-finger in Ran binding protein and others. 
[1661] 731 . ANI-Iike Zinc finger 

[1662] Zinc finger at the C-terminus of Ani Swtss:Q91889 , a ubiquitin-like protein in Xenopus iaevis. The following 
pattern describes the zinc finger. C-X2-C-X(9-12)-C-X(1-2)-C-X4-C-X2-H-X5-H-X-C Where X can be any amino acid, 
25 and numbers in brackets indicate the number of residues. 

[1663] [1] Linnen JM. Bailey CP, Weeks DL; Gene 1993;128:181-188. 
[1664] 732. 14-3-3 proteins 

Structure of a 14-3-3 protein and implications for coordination of multiple signalling pathways. 
Xiao B, Smerdon SJ, Jones DH, Dodson GG. Sonejl Y, Aitken A, Gamblin SJ; Nature 1995;376:188-191. 
30 Crystal structure of the zeta isoform of the 14-3-3 protein. 

Liu D, Bienkowska J. Petosa C, Collier RJ, Fu H, Liddington R; 
Nature 1995;376:191-194. 

[1665] Interaction of 14-3-3 with signaling proteins is mediated by the recognition of phosphoserine. 
Muslin AJ, Tanner JW, Allen PM, Shaw AS; 
35 Cell 1996;84:889-897. 

[1666] The 14-3-3 protein binds its target proteins with a common site located towards the C4erminus. 
Ichimura T, Ito M, Itagaki C, Takahashi M, Horigome T, Omata S, Ohno S, Isobe T 
FEBS Lett 1997;413:273-276. 

[1667] Molecular evolution of the 14-3-3 protein family. 
40 Wang W. Shakes DC 

J Mol Evoi 1996;43:384-398. 
Function of 14-3-3 proteins. 
Jin DY, Lyu MS, Kozak CA, Jeang KT 
Nature 1996;382:308-308. 

45 [1668] The 14-3-3 proteins [1,2,3] are a family of closely related acidic homodimeric proteins of about 30 Kd which 
were first identified as being very abundant in mammalian brain tissues and located preferentially in neurons. The 
14-3-3 proteins seem to have multiple biological activities and play a key role in signal transduction pathways and the 
cell cycle. They interacts with kinases such as PKC or Raf-1 ; they seem to also function as protein-kinase dependent 
activators of tyrosine and tryptophan hydroxylases and in plants they are associated with a complex that binds to the 

50 G-box promoter elements. 

[1669] The 14-3-3 family of proteins are ubiquitously found in all eukaryotic species studied and have been se- 
quenced in fungi (yeast BMH1 and BMH2, fission yeast rad24 and rad25), plants, Drosophila, and vertebrates. The 
sequences of the 14-3-3 proteins are extremely well conserved. Two highly conserved regions have been selected as 
signature pattems: the first is a peptide of 11 residues located in the N-terminal section; the second, a 20 amino acid 

55 region located in the C-terminal section. 

- Consensus pattem: R-N-L-(LIV]-S-[VG]-tGA]-Y-[KN)-N-(IVA] 

- Consensus pattem: Y-K-[DE]-S-T-L-I-[IM]-Q-L-[LF]-[RHC]-D-N-[LF1-T-[LS]-W-[TAN1-[SAD] 
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[ 1] Aitken A. 



[ 2] Morrison D. 

Science 266:56-57(1994). nomhiin <5 I 

[ 31 Xiao B., Smerdon S.J.. Jones D.H.. Dodson G.G., Sonej. Y, Aitken A.. Gamblin S.J. 
Nature 376:188-191(1995). 

ri6701 733 D-isomer specific 2-hydroxyacid dehydrogenases (2 HacidDH) *^^:iioc in 

?h?iamSver?S Formate dehi^genase. D^lycera^^ 

SCOP A number of NAD-dependent 2-hydroxyacid dehydrogenases which seem to be specific for ^ 
the°suJs!l^te^a>J been shLn [1.2.3.4] to be functiona.Iy and structurally related. These enzymes are listed below. 

. O-lactatedehydrogenase^CI.;-^^^^^^^^^ 
. S;^e^e?e^~ 

. D.2'CrSsotp-fe'^^^^^^ (EC (D-hicDH). a bacterial enzyme that catalyzes the reversible 

ar^ sS^cific interconversion between 2-ketocarboxylic acids and 

diiu 5>K7i woK«^ 4 o -1 o\ ic:nui\ frr»m thp harterla Pseudomonas sp. 101 and various fungi 

vancomycin binding. 

Escherichia coli hypothetical protein ycdW. 
Escherichia coli hypothetical protein yiaE. 
Haemophilus influenzae hypothetical protein HI1556. 
Yeast hypothetical protein YER081w. 
Yeast hypothetical protein YIL074w. 

ri67n All these enzymes have similar enzymatic activities and are structurally related. Three of the most conserved 



anism. 



. consensus pattern: [L.VMA]-[AG]-[IVT]-[LIVMFY]-lAG]-x-G.[NHKRQGSAC]-[LIV]-G-x{13.14HLIVfMTl-x(2HFY^ 
. Sn3fpa?ern: IL.VMFYWAWL.VFYWCW^^^^^^ 

. ^o^'-ienTS'Srr^FATCH^^^^^^^ 

[LIVMC]-[DNV] 

m Hrant G A Biochem Biophys. Res. Commun. 165:1371-1374(1989). -o>i.cn 
S K^chhar S.. HunX P.. [eong-Morgenthaler P.M.. Hettinger H. Biochem. Biophys. Res. Commun. 184.60-66 

(1992). 

[3] Ohta T.. Taguchi H. J. Biol. Chem. 266:12588-12594(1991) 

4 Goldberg J.D.. Yoshida T.. Brick P. J. Mol. Biol. 236:1123-1140(1994). 

[5] Popov V.O. Lamzin V.S. Biochem. J. 301:625-643(1994). 

2.6 angstroms resolution. , . * u 

Mattevi A. Obmolova G. Kalk KH, Westphal AH. De Kok A. Hoi WG. 
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J Mol Biol 1993;230:1183-1199. 

These proteins contain one to three copies of a tipoyi binding domain followed by the catalytic domain. 
[1673] 735. 3-beta hydroxysteriod dehydrogenase/lsomerase family 
Structure and tissue-specific expression of 3 
5 beta>hydroxysteroid dehydrogenase/5-ene-4-ene isomerase genes in human and rat classical and peripheral ster- 
oidogenic tissues. 

Labrie F, Simard J, Luu-The V, Pelietier G, Belanger A, 

Lachance Y, Zhao HF, Labrie C. Breton N. de Launoit Y, et al 

J Steroid Biochem Mol Biol 1992:41:421-435. 
10 The enzyme 3 beta-hydroxysteroid dehydrogenase/5-ene-4-ene isomerase (3 beta-HSD) catalyzes the oxidation and 

isomerization of 5-ene-3 beta-hydroxypregnene and 5-ene-hydroxyandrostene steroid precursors into the correspond- 
ing 4-ene-ketosteroids necessary for the fomriation of all classes of steroid hormones. 

[1674] 736. 3-hydroxyacyl-CoA dehydrogenase 

This family also includes lambda crystallin. 
15 Structure of L-3-hydroxyacyl-coenzyme A dehydrogenase: 

preliminary chain tracing at 2.8-A resolution. 

Birktoft JJ, Holden HM, Hamlin R, Xuong NH, Banaszak LJ; 

Proc Natl Acad Sci U S A 1987;84:8262-8266. 

[1675] 3-hydroxyacyl-CoA dehydrogenase (EC 1 . 1 .1 .35) (HCDH) [1] is an enzyme involved in fatty acid metabolism. 
20 it catalyzes the reduction of 3-hydroxyacy l-CoA to 3-oxoacyl-CoA. Most eukaryotic cells have 2 fatty-acid beta-oxidation 
systems, one located in mitochondria and the other in peroxisomes. In peroxisomes 3-hydroxyacyl-CoA dehydrogenase 
forms, with enoyl-CoA hydratase (ECH) and 3,2-trans-enoyl-CoA isomerase (ECl) a multifunctional enzyme where the 
N-terminal domain bears the hydratase/isomerase activities and the C-terminal domain the dehydrogenase activity. 
There are two mitochondrial enzymes: one which is monofunctional and the other which is, like its peroxisomal coun- 
ts terpart, multifunctional. 

[1 676] In Escherichia coli (gene fadB) and Pseudomonas fragi (gene faoA) HCDH is part of a multifunctional enzyme 
which also contains an ECH/ECI domain as well as a 3-hydroxybutyryl-CoA epimerase domain [2]. 
[1677] The other proteins structurally related to HCDH are: 

30 - Bacterial 3-hydroxybutyryl-CoA dehydrogenase (EC 1.1.1.157) which reduces 3-hydroxybutanoyl-CoA to ace- 
toacetyl-CoA [3]. 

Eye tens protein lambda-crystallin [4], which is specific to lagomorphes (such as rabbit). 

There are two major region of similarities in the sequences of proteins of the HCDH family, the first one located in the 
35 N-terminal, corresponds to the NAD-binding site, the second one is located in the center of the sequence. A signature 
pattern has been derived from this central region. 

- Consensus pattern: [DNE]-x(2)-[GA]-F-[LIVMFY]-x-[NT]-R-x(3)-[PA]-[LIVMFY](2)-x(5).[LIVMFYCT|-[LIVMFY]-x 

(2)-[GV] 

40 

1 1] Birictoff J.J.. Holden H.M., Hamlin R., Xuong N.-H., Banaszak L.J, Proc. Natl. Acad. Sci. U.S.A. 84:8262-8266 
(1987). 

[2] Nakahigashi K.. Inokuchi H. Nucleic Acids Res. 18:4937-4937(1990). 

[3] Mullany P., Clayton C.L., Pallen M.J.. Slone R.. Al-Saleh A., Tabaqchali S. FEMS Microbiol. Lett. 124:61-67 
45 (1994). 

[4] Mulders J.W.M., Hendriks W., Blankesteijn W.M., Bloemendal H., de Jong W.W. J. Biol. Chem. 263: 
15462-15466(1988). 

[1678] 737. 60s Acidic ribosomal protein 
50 Proteins PI. P2, and PO, components of the eukaryotic 

ribosome stalk. New structural and functional aspects. 

Remacha M. Jimenez-Diaz A, Santos C. Briones E, Zambrano R. 

Rodriguez Gabriel MA. Guarinos E. Ballesta JP; 

Biochem Cell Biol 1995;73:959-968. 
55 This family includes archaebacterial LI 2, eukaryotic PO, PI and P2. 

[1679] 738. 6-phosphogluconate dehydrogenases 

6-phosphogluconate dehydrogenase (EC 1.1.1.44) (6PGD) catalyzes the third step in the hexose monophosphate 
shunt, the decart)oxylating reduction of 6-phosphogluconate in to ribulose 5-phosphate. 
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of e-pho'sphogluconate has been selected as a signature pattern. 
5 - Consensus pattern: [LlVM]-x-D-x{2)-lGA]-[NQS]-K-G-T-G-x-W 

[ 1] Reizer A.. Deutscher J.. Saier M.H. Jr., Reizer J. 
Mol. Microbiol. 5:1081-1089(1991). 

[ 2] Adams M.J., Archibald I.G.. Bugg C.E.. Came A.. Cover S., 
10 Helliwell J.R.. Pickersgill R.W.. White S.W. 

EMBO J. 2:1009-1014(1983). 

. 

. 5-hydroxytryptamine (serotonin) 1 A to 1 F. 2A to 2C. 4. 5A, 5B. 6 and 7 [5]. 

- Acetylcholine, muscarinic-type. Ml to M5. 
. Adenosine A1 , A2A, A2B and A3 16]. 

20 - Adrenergic alpha-1 A to -1 C; alpha-2A to -2D; beta-1 to -3 [7]- 

Angiotensin II types I and II. 

Bombesin subtypes 3 and 4. 

Bradykinin B1 and B2. 

c3a and C5a anaphylatoxin. 
25 - Cannabinoid CB1 and CB2. 

- Chemokines C-C CC-CKR-1 to CC-CKR-8. 

- Chemokines C-X-C CXC-CKR-1 to CXC-CKR-4. 

- Cholecystokinin-A and cholecystokinin-B/gastrin. 
Dopamine D1 to D5 [8]. 

30 - Endothelin ET-a and ET-b [9]. 

- fMet-Leu-Phe (fMLP) (N-formyl peptide). 
Follicle stimulating hormone (FSH-R) [10]. 
Galanin. 

Gastrin-releasing peptide (GRP-R). 
35 - Gonadotropin-releasing homnone (GNRH-R). 

- Histamine HI and H2 (gastric receptor I). 
Lutropin-choriogonadotropic hormone (LSH-R) [1 0]. 
Melanocortin MCI R to MC5R. 

Melatonin. 
40 - Neuromedin B (NMB-R). 

- Neuromedin K (NK-3R). 
Neuropeptide Y types 1 to 6. 
Neurotensin (NT-R). 
Octopamine (tyramine). from insects. 

45 - Odorants [11]. 

- Opioids delta-, kappa- and mu-types [12]. 

- Oxytocin (OT-R). 

- Platelet activating factor (PAF-R). 
Prostacyclin. 

50 - Prostaglandin D2. 

- Prostaglandin E2, EP1 to EP4 subtypes. 
Prostaglandin F2. 

- Purinoreceptors (ATP) [13]. 
Somatostatin types 1 to 5. 

55 - Substance-K (NK-2R). 

- Substance-P (NK-1 R). 
Thrombin. 
Thromboxane A2. 
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- Thyrotropin (TSH-R) [10). 
Thyrotropin releasing factor (TRH-R). 
Vasopressin V1 a. V1 b and V2. 

Visual pigments (opsins and rhodopsin) [14]. 
5 - Proto-oncogene mas. 

A number of orphan receptors (whose ligand is not known) from mammals and birds. 
Caenorhabditis elegans putative receptors C06G4.5, C38C10.1, C43C3.2, T27D1.3 and ZC84.4. 
Three putative receptors encoded in the genome of cytomegalovirus: US27, US28, and UL33. 
ECRF3, a putative receptor encoded in the genome of herpesvirus saimiri. 

10 

[1 682] The structure of all these receptors is thought to be identical. They have seven hydrophobic regions, each of 
which most probably spans the membrane. The N-terminus is located on the extracellular side of the membrane and 
is often glycosylated, while the C-terminus is cytoplasmic and generally phosphorylated. Three extracellular loops 
alternate with three intracellular loops to link the seven transmembrane regions. Most, but not all of these receptors, 
15 lack a signal peptide. The most conserved parts of these proteins are the transmembrane regions and the first two 
cytoplasmic loops. A conserved acidic-Arg-aromatic triplet is present in the N-terminal extremity of the second cyto- 
plasmic loop [15] and could be implicated in the interaction with G proteins. 

[1683] To detect this widespread family of proteins, a pattern that contains the conserved triplet and that also spans 
the major part of the third transmembrane helix has been developed. 

20 

- Consensus pattern: [GSTALIVMFYWC]-[GSTANCPDE]-{EDPKRH^x(2)-[LIVMNQGA]-x(2)-[LIVMFT]-[GSTANC]- 
[UVMFYWSTAC]-[DENH]-R-[FYWCSH]-x(2HLIVM] 

[ 1] Strosberg A.D. 
25 Eur. J. Biochem. 196:1-10(1991). 

[2] Kerlavage A.R. 

Cun-. Opin. Struct. Biol. 1:394-401(1991). 

[ 3] Probst W.C.. Snyder L.A.. Schuster D.I., Brosius J., Sealfon S.C. 

DNA Cell Biol. 11:1-20(1992). 
30 [ 4] Savarese T.M., Fraser CM. 

Biochem. J. 283:1-9(1992). 

[ 5] Branchek T. 

Cun-. Biol. 3:315-317(1993). 

[ 6] Stiles G.L. 
35 J. Biol. Chem. 267:6451-6454(1992). 

[7] Friell T, Kobilka B.K., Lefkowltz R.J., Caron M.G. 

Trends Neurosci. 11:321-324(1988). 

[ 8] Stevens C.F. 

Cun-. Biol. 1:20-22(1991). 
40 [ 9] Sakurai T, Yanagisawa M., Masaki T. 

Trends Phamiacol. Sci. 13:103-107(1992). 

[10] Salesse R., Remy J.J., Levin J.M., Jallal B., Gamier J. 

Biochimie 73:109-120(1991). 

[11] Lancet D„ Ben-Arie N. 
45 Cun-. Biol. 3:668-674(1993). 

[12] Uhl G.R., Childers S., Pasternak G. 

Trends Neurosci. 17:89-93(1994). 

[13] Barnard E.A.. Bumstock G.. Webb TE. 

Trends Pharmacol. Sci. 15:67-70(1994). 
so [14] Applebury M.L.. Hargrave P.A. 

Vision Res. 26:1881-1895(1986). 

[15] Attwood T.K., Eliopoulos E.E.. Findlay J.B.C. 

Gene 98:153-159(1991). 

55 [1684] (7tm 1) Visual pigments (opsins) retinal binding site 

Visual pigments [1 .2] are the light-absorbing molecules that mediate vision. They consist of an apoprotein, opsin, 
covalently linked to the chromophore ctsHretinal. Vision is effected through the absorption of a photon by cis-retinal 
which is isomerized to trans-retinal. This Isomerization leads to a change of conformation of the protein. Opsins are 
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r-, ^oci 1^ w«r*oKro+oQ foi If different Diaments are generally found. Koa ceiis, wniun mcu.a ^ a ^ 

mr^gme'nt' ;.;odops;n. Cone cells, which function in bright light, are responsible tor co.or v.s.on a„u ^ =«> - 

more color pigments (for example, in mammals: red ^^J"*; ommatidium contains eight photore- 

' ^e^c^rsJ^rrbJ^Vtr^^^ 

R7 and R8) expresses a speafic opsin. . . , ^ r*.tinnrhrome also known as retinal photoisomerase, 

ScorvrirrmT^T.^^ 

'° r;eCT"^rchment?i^^ 

transmembrane helix. The pattern that had been developed includes th.s residue. 

. consensus pattern: iLIVMWACHPGCl-x(3)-[SACl-K-lSTAL.MRHGSACPNVl-ISTACPW^^^^ 

15 [lY] 

[K is the retinal binding site] 

[ 1] Applebury M.L., Hargrave P.A. 
Vision Res. 26:1881-1895(1986). 
20 [ 2] Fryxell K.J., Meyerowitz E.M. 

J. Mol. Evol. 33:367-378(1991). 

[ 3] Shen D., Jiang M.. Hao W.. Tao L., Salazar M., Fong H.K.W. 
Biochemistry 33:13117-13125(1994). 

25 [1 689] The following descriptions of protein family functions are not provided by the Pfam or Prosite databases. 
[1690] 740. BAH 

BAH domain. Number of members: 65 

" K domain'TTe'ELM2 (Egl-27 and MTA1 homology 2) domain is a small domain of unknown function. Number of 
members: 10 ,j,,„yotic poRIN The major protein of the outer mitochondrial membrane of eukaryotes 

[1 692] 742. Euk proin. EUKARYOTIC_POR N 1 ne major P™ that behaves as a general diffusion pore 

the mitochondrial outer membrane. Yeast contains two '"embe'-s of th s family (genes PORT 
[LIVMY] 

[ 1] Benz R. Biochim. Biophys. Acta 1197:167-196(1994). 
[2] Manella C.A. Trends Biochem. Sci. 17:315-320(1992). 

[ 3] Dihanich M. Experientia 46:146-153(1990). ^.nnQsi^ 
4 Forte M.. Guy H.R.. Mannella C.A. J. Bioenerg. B.omembr. '^'^''l^ill^^^^^^ 
\ 5] Sampson M.J.. Lovell R.S.. Davison D.B.. Craigen W.J. Genomics 36.192-196(1996). 

55 [1693] 743. Glyco hydor 19 
Chitinases family 19 signatures 



40 



45 



50 
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in chitin polymers. From the view point of sequence similarity chitinases belong to either family 18 or 19 in the classi- 
fication of glycosyl hydrolases [2,E1]. Chitinases of family 19 (also known as classes lA or 1 and IB or II) are enzymes 
from plants that function. In the defense against fungal and insect pathogens by destroying their chitin-containing cell 
wall. Class lA/l and IB/II enzymes differ in the presence (lA/l) or absence (18/11) of a N-terminal chitin-binding domain 
5 (see the relevant entry <PDOC00025>). The catalytic domain of these enzymes consist of about 220 to 230 amino 
acid residues. 

Two highly conserved regions were selected as signature pattems, the first one is located in the N-terminal section 
and contains one of the six cysteines which are conserved in most, if not all, of these chitinases and which is probably 
involved in a disulfide bond. 
10 Consensus pattemC-x(4.5)-F-Y-[STl-x(3)-[FYHLIVMF]-x-A-x(3)-[YF]-x(2)-F-[GSA] 
[1694] Consensus pattem[LIVM]-[GSA]-F-x-[STAG](2)-(LIVMFYl-W-[FY]-W-[LIVM] 

[ llFlach J.. Pilet P.-E., Jolles P. Experientia 48:701-716(1992). 
[ 2J Henrissat B. Biochem. J. 280:309-316(1991). 

15 

[1695] 744. MBD 
Methyl-CpG binding domain 

The Methyl-CpG binding domain (MBD) binds to DNA that contains one or more symmetrically methylated CpGs [1]. 
DNA methylation in animals is associated with alterations in chromatin structure and silencing of gene expression. 
20 MBD has negligible non-specific affinity for DNA. In vitro foot-printing with MeCP2 showed the MBD can protect a 12 
nucleotide region surrounding a methyl CpG pair [1]. MBDs are found in several MethyK^pG binding proteins and also 
DNA demethylase [2]. Number of members: 11 

[1]Medline: 94232813. Dissection of the methyl-CpG binding domain from the chromosomal protein MeCP2. Nan 
25 X, Meehan RR. Bird A; Nucleic Acids Res 1993;21 :4886-4892. 

[2]Medline: 99158138. A mammalian protein with specific demethylase activity for mCpG DNA. Bhattacharya SK, 
Ramchandani S. Cervoni N. Szyf M; Nature 1999;397:579-583. 

[1696] 745. Peptidase C1 
30 Eukaryotic thiol (cysteine) proteases active sites 

cross-references) THIOL_PROTEASE_CYS; THIOL_PROTEASE_HIS; 
THIOL_PROTEASE_ASN 

Eukaryotic thiol proteases (EC 3.4.22.-) [1] are a family of proteolytic enzymes which contain an active site cysteine. 
Catalysis proceeds through a thioester intermediate and is facilitated by a nearby histidine side chain; an asparagine 
35 completes the essential catalytic triad. The proteases which are currently known to belong to this family are listed below 
(references are only provided for recently determined sequences). 

- Vertebrate lysosomal cathepsins B (EC 3.4.22.1). H (EC 3.4.22.16), L (EC 3.4.22.15), and S (EC 3.4.22.27) [2]. 
Vertebrate lysosomal dipeptidyl peptidase I (EC 3.4.14.1) (also known as cathepsin C) [2]. 
40 . Vertebrate calpains (EC 3.4.22.17). Caipains are intracellular calcium- activated thiol protease that contain both 
a N-terminal catalytic domain and a C-terminal calcium-binding domain. 
Mammalian cathepsin K, which seems involved in osteoclastic bone resorption [3]. 
Human cathepsin O [4]. 

Bleomycin hydrolase. An enzyme that catalyzes the inactivation of the antitumor drug BLM (a glycopeptide). 
45 - Plant enzymes: bariey aleurain (EC 3.4.22.16), EP-B1/B4; kidney bean EP-C1 , rice bean SH-EP; kiwi fruit actinidin 
(EC 3.4.22.14); papaya latex papain (EC 3.4.22.2), chymopapain (EC 3.4.22.6), caricain (EC 3.4.22.30), and pro- 
teinase IV (EC 3.4.22.25); pea turgor-responsive protein 15A; pineapple stem bromelain (EC 3.4.22.32); rape 
COT44; rice oryzain alpha, beta, and gamma; tomato low-temperature induced. Arabidopsis thaliana A494, RD19A 
and RD21A. 

50 - House-dust mites allergens DerPI and EurMI. 

Cathepsin B-like proteinases from the worms Caenorhabditis elegans (genes gcp-1 . cpr-3. cpr-4. cpr-5 and cpr- 

6), Schistosoma mansoni (antigen SM31) and Japonica (antigen SJ31), Haemonchus contortus (genes AC-1 and 

AC-2), and Ostertagia ostertagi (CP-1 and CP-3). 

Slime mold cysteine proteinases CP1 and CP2. 
55 - Cruzipain from Trypanosoma cruzi and brucei. 

Throphozoite cysteine proteinase (TCP) from various Plasmodium species. 

Proteases from Leishmania mexicana, Theileria annulata and Theileria parva. 

Baculoviruses cathepsin-like Enzyme (v-cath). 
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. Drosophila small optic lobes protein (gene sol), a neuronal protein that contains a calpain^ike domain. 

Yeast thiol "rotease BLH 1 /YCP1 /I-AP3. 
. Carnorhabditis'el^ans hypothetical protein C06G4.2, a calpain-like protein. 

[1 $97] Two bacterial peptidases are also part of this family: 

- Aminopeptidase C from Lactococcus lactis (gene pepC) [5]. 

- Thiol protease tpr flrom Porphyromonas gingivalis. 

[1698] Three other proteins are structurally related to this family, but may have lost their proteolytic activity. 

- Soybean oil body protein P34. This protein has its active site cysteine replaced by a glycine. 

. Rat estD, a Sertoli eel, secre.o^ protein highly similar to cathepsin L but with ';^.^ff^!^^i;i^^''^^ 
by a serine. Rat testin should not be confused with mouse testrn which is a LIM-doma.n protein (see 

- JasmS^um falciparum serine-repeat protein (SERA), the major blood stage antigen protein of 111 Kd pos- 
sesses a C-terminal thiol-protease-like domain [6], but the active site cysteine is replaced by a senne. 

The sequences around the three active site residues are well conserved and can be used as signature patterns. 

;i 699r Consensus pattemQ-x(3)-IGE]-x-C-[YW]-x(2HSTAGC]-[STAGCN^ [C is the active s.te resKJue] 

Note the residue in position 4 of the pattern is almost always cysteine; the only exceptions are calpa ns (Leu) bleomycin 

K.?Hti«rr<rj;^rrnd veas^ YCP1 (Ser) Note the residue in position 5 of the pattem is always Gly except in papaya 

pSSseliwtJelr;^^^ 

Srse^tsSwCHl-IWH^^^^^ 

SrtSI^'r^Snrbelong to family C1 (papain-type) and C2 (calpains) in the classlflcation of peptidases [7.E1]. 
[ llDufour E. Biochimie 70:1335-1342(1988). 

[ 2]Kirschke H., Barrett A.J.. Rawlings N.D. Protein Prof. 2:1587-1643(1995) 134(1995) 

3 Shi G.-P.. Chapman H.A., Bhairi S.M.. Deleeuw C. Reddy V-Y. Weiss SJ. f^^^SLett_35^ 29-13409^ 

4 Velasco G.. Ferrando A.A.. Puente X.S.. Sanchez L.M.. Lopez-Otin C. J. Biol. Chem. 269-27136-27142{199^^^ 
[sichapot-Chartier M.P.. Nardi M.. Chopin M.C.. Chopin A.. Gripon J.C. Appl. Environ. Microbiol. 59.330-333 
(1993) 

[ 6]Higgins D.G.. McConnell D.J.. Sharp P.M. Nature 340:604-604(1989). 
[7]Rawlings N.D.. Barrett A.J. Meth. Enzymol. 244:461-486(1994). 

[1700] 746. Peptidase M22 

Glycjoprotease family signature cross-reference(s) GLYCOPROTEASE hv Psk; 

Gljcoprotease (GCP) (EC 3.4.24.57) [1], or o-syaloglycoprotein endopeptidase .s a ^^'^''l^'^^^^^ 
teurella haemolytica which specifically cleaves O-sialoglycoproteins such as glycophonn A. The sequence of GCP .s 
highly similar to the following uncharacterized proteins: 

- Escherichia coli hypothetical protein ygjD (ORF-X). 
Bacillus subtilis hypothetical protein ydiE. 
Mycobacterium leprae hypothetical protein U229E. 
Mycobacterium tuberculosis hypothetical protein MtCY7B.10. 

. Synechocystis strain PCC 6803 hypothetical protein slr0807. 
Methanococcus jannaschii hypothetical protein MJ1130. 

- Haloarcula marismortui hypothetical protein in HSH 3Yegion. 

- Yeast hypothetical protein YKR038c. 
Yeast hypothetical protein QRI7. 

[1701] One of the conserved regions contains two conserved histidines. It is possible that this region is involved in 
Note these proteins belong to family M22 In the classification of peptidases [2,E1]. 
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( 1]Abdullah K.M., Lo R.Y.C.. Mellors A. J, Bacteriol. 173:5597-5603(1991). 
[21Rawlings N.D.. Barrett AJ. Meth. Enzymol. 248:183-228(1995). 

[1703] 747. SAM. SAM domain (Sterile alpha motif) 
5 It has been suggested that SAM is an evolutionarily conserved protein binding domain that is Involved in the regulation 
of numerous developmental processes in diverse eukaryotes. The SAM domain can potentially function as a protein 
interaction module through its ability to homo- and heterooligomerise w\ih other SAM domains. Number of members: 81 

[1]Medline: 96100659 SAM: A novef motif in yeast sterile alpha and Drosophila polyhomeotic proteins Renting CP; 
10 Prot Sci 1995:4:1928.1930. 

[2]Medline: 97160498 SAM as a protein interaction domain involved in developmental regulation. Shultz J, Ponting 
CP. Hofmann K. Bori< P; ProX Sci 1997;6:249-253. 

[3]Medline: 99101382 The crystal structure of an Eph receptor SAM domain reveals a mechanism for modular 
dimerization. Reference Author: Stapleton D, Balan I, Pawson T, Sicheri F; Nat Struct Biol 1999;6:44-49. 

15 

[1704] 748. Tyrosinase signatures cross-reference(s) TYROSINASE. 1 ; TYROSINASE_2 Tyrosinase (EC 1 .14.1 8. 1 ) 
[1] is a copper monooxygenases that catalyzes the hydroxylation of monophenols and the oxidation of o-diphenols to 
o-quinols. This enzyme, found in prokaryotes as well as in eukaryotes, is involved in the formation of pigments such 
as melanins and other polyphenolic compounds. 
20 [1 705] Tyrosinase binds two copper ions (CuA and CuB). Each of the two copper ion has been shown [2] to be bound 
by three conserved histidines residues. The regions around these copper^binding ligands are well conserved and also 
shared by some hemocyanins, which are copper-containing oxygen carriers from the hemolymph of many molluscs 
and arthropods [3,4]. 

[1706] At least two proteins related to tyrosinase are known to exist in mammals: 

25 

TRP-1 (TYRP1) [5], which is responsible for the conversion of 5.6-dihydroxyindole-2-carboxylic acid (DHICA) to 
indole-5,6-quinone-2-carboxylic acid. 

TRP-2 (TYRP2) [6], which is the melanogenic enzyme DOPAchrome tautomerase (EC 5.3.3.12) that catalyzes 
the conversion of DOPAchrome to DHICA. TRP-2 differs from tyrosinases and TRP-1 in that it binds two zinc ions 
30 instead of copper [7]. 

[1707] Other proteins that belong to this family are: 

Plants polyphenol oxidases (PPG) (EC 1.10.3.1) which catalyze the oxidation of mono- and o-diphenols to o- 
35 diquinones [8]. 

Caenorhabditis elegans hypothetical protein C02C2.1. 

[1708] Two signature patterns for tyrosinase and related proteins have been derived The first one contains two of 
the histidines that bind CuA, and is located in the N-terminal section of tyrosinase. The second pattem contains a 
40 histidine that binds CuB, that pattem is located in the central section of the enzyme. 

Consensus pattem H-x(4,5)-F-[LIVMFTP]-x-[FW] -H-R-x(2)-[LMJ-x(3)-E [The two H's are copper ligands] 
[1709] Consensus pattemD-P-x-F-[LIVMFYW]-x(2)-H-x(3)-D [H is a copper ligand] 

[ IJLerch K. Prog. Clin. Biol. Res. 256:85-98(1988). 
45 [ 2]Jackman M.R. Hajnal A.. Lerch K. Biochem. J. 274:707-713(1991). 

[ 3JLinzen B. Naturwissenschaflen 76:206-211(1989). 

[4]Lang W.H.. van Holde K.E. Proc. Natl. Acad. Sci. U.S.A. 88:244-248(1991). 

[ 5]Kobayashi T. Urabe K.. Winder A.. Jimenez-Cervantes C. Imokawa G.. Brewington T.. Solano F., Garcia- 
Bon^on J.C.. Hearing V.J. EMBO J. 13:5818-5825(1994). 
50 [ 6]Jackson LJ., Chambers O.M.. Tsukamoto K., Copeland N.G.. Gilbert D.J.. Jenkins N.A., Hearing V. EMBO J. 

11:527-535(1992). 

[ 7]Solano F., Martinez-Liarte J.H.. Jimenez-Cervantes C, Garcia-Borron J.C., Lozano J.A. Biochem. Biophys. 
Res. Commun. 204:1243-1250(1994). 

[ 8]Cary J.W.. Lax A.R.. Fluri^ey W.H. Plant Mol. Biol. 20:245-253(1992). 

55 

[1710] 749. (Mur Ligase) Folylpolyglutamate synthase signatures 

Folylpolyglutamate synthase (EC 6.3.2.17) (FPGS) [1] is the enzyme of folate metat)olism that catalyzes ATP-depend- 
ent addition of glutamate moieties to tetrahydrofolate. 
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[1711] Its sequence is moderately conserved between prokaryotes (gene folC) and eukaryotes We developed two 
^ :..;.„ „„ th^ mns^rvfid reolons which are rich in glycine residues and could play a role ,n the 

catalytical activity and/or in substrate binding. 

Description of pattern(s) and/or profile(s) „,,„,-x ,001^1 

Consensus pattern[LIVMFY]-x-[LIVM]-tSTAG]-G-T-[NK]-G-K-x-[S-n-x(7HLIVMl(2)-x(3HGSK] 

Consensus pattern[LIVMFY](2)-E-x-G-[LIVM)-[GA]-G-x(2)-D-x-[GSTl-x-tLIVMl(2) ,,r.roq b34 

[1712] [ 1]Shane B., Garrow T. Brenner A.. Chen U., Choi Y. J., Hsu J.C., Stover P. Adv. Exp. Med. Biol. 338.629-634 
(1993) 

[1713l' 750. (Peptidase M3) Neutral zinc metallopeptidases, zinc-binding region signature 

SL" ajority 6f zinc-dependent metallopeptidases (with the notable exception of the carboxypepfdases) share a com^ 
mon pittern of primary structure [1.2.3] in the part of their sequence involved in the binding of zinc, and can be grouped 
3e?rs a s:perfamily.known as the metzincins. on the basis of this sequence similarity. They can be cl-sf ^ ^ 
a^umber of distinct famiLs[4.E1lwh.charelisted below along with the pro^^^^^ 

to these families. 
[1714] Family Ml 

- Bacterial aminopeptidase N (EC 3.4.11 .2) (gene pepN). 

: r Srn ramTa—S^^^^^ (aminopeptidase A). It may play a role in regulating growth 

and differentiation of early B-lineage cells. 
Yeast aminopeptidase ysci! (gene APE2). 
Yeast alanine/arginine aminopeptidase (gene AAP1). 

: SSoilTirh;';^^^^^^^^^^^ This enzyme is responsible for the hydrolysis of an epoxide moiety of 

LTA-4 to form LTB-4; it has been shown that it binds zinc and is capable of peptidase activity. 

[17151 Family M2 

- Angiotensin-converting enzyme (EC 3.4.15.1) (dipeptidyl carboxypeptidase I) (ACE) the enzyme ^espon Jbte for 
hydrolyzing angiotens^ I to angiotensin II. There are two fbmis of ACE: a testis-specrfic isozyme and a somatic 
isozyme which has two active centers. 

[1716] Family M3 

- Thimet oligopeptidase (EC 3.4.24.15). a mammalian enzyme Involved in the cytoplasmic degradation of small 

- Neurorvsin (EC 3 4.24.16) (also known as mitochondrial oligopeptidase M or microsomal endopeptidase). 

- Mr^Sriaimtermediatepept^ 

of some proteins imported in the mitochondrion. 

- Yeast saccharolysin (EC 3.4.24.37) (proteinase yscD). 

- Escherichia coli and related bacteria dipeptidyl carboxypeptidase (EC 3.4.15.5) (gene dcp). 

- Escherichia coli and related bacteria oligopeptidase A (EC 3.4.24.70) (gene opdA or priC). 
Yeast hypothetical protein YKL134c. 

[1717] Family M4 

. Thermostable themiolysins (EC 3.4.24.27). and related themriolabile neutral proteases (bacillolysins) (EC 
3.4.24.28) from various species of Bacillus. 

- Pseudolysin (EC 3.4.24.26) from Pseudomonas aeruginosa (gene lasB). 

- Extracellular elastase from Staphylococcus epidemiidis. 
Extracellular protease prti from Erwinia carotovora. 

- Extracellular minor protease smp from Serratia marcescens. 

- Vibriolysin (EC 3.4.24.25) from various species of Vibrio. 
Protease prtA from Listeria monocytogenes. 

- Extracellular proteinase proA from Legionella pneumophila. 

[1718] Family M5 
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Mycolysin (EC 3.4.24.31) from Streptomyces cacaoi. 
[1719] Family M6 

Immune inhibitor A from Bacillus thuringiensis (gene ina). Ina degrades two classes of insect antibacterial proteins, 
attacins and cecropins. 

[1720] Family M7 

Streptomyces extracellular small neutral proteases 

[1721] Family M8 

Leishmanolystn (EC 3.4.24.36) (surface glycoprotein gp63). a cell surface protease from various species of Leish- 
mania. 

[1722] Family M9 

Microbial collagenase (EC 3.4.24.3) from Clostridium perfringens and Vibrio alginolyticus. 

[1723] Family M1 OA 

Serralysin (EC 3.4.24.40), an extracellular metalloprotease from Serratia. 
Alkaline metalloproteinase from Pseudomonas aeruginosa (gene aprA). 
Secreted proteases A, B, C and G from Erwinia chrysanthemi. 
Yeast hypothetical protein YILIOSw. 

[1724] Family M10B 

Mammalian extracellular matrix metalloproteinases (known as matrlxins) [5]: MMP-1 (EC 3.4.24.7) (interstitial col- 
lagenase). MMP-2 (EC 3.4.24.24) (72 Kd gelatinase), MMP-9 (EC 3.4.24.35) (92 Kd gelatinase). MMP-7 (EC 
3.4.24.23) (matrylisin), MMP-8 (EC 3.4.24.34) (neutrophil collagenase). MMP-3 (EC 3.4.24.17) (stromelysin-1 ). 
MMP-1 0 (EC 3.4.24.22) (stromelysin-2), and MMP-1 1 (stromelysin-3). MMP-1 2 (EC 3.4.24.65) (macrophage met- 
alloelastase). 

Sea urchin hatching enzyme (envelysin) (EC 3.4.24. 1 2). A protease that allows the embryo to digest the protective 
envelope derived from the egg extracellular matrix. 
Soybean metalloendoproteinase 1. 

[1725] Family M11 

Chiamydomonas reinhardtii gamete lytic enzyme (GLE). 
[1726] Family Ml 2A 

Astacin (EC 3.4.24.21), a crayfish endoprotease. 

Meprin A (EC 3.4.24.18), a mammalian kidney and intestinal brush border metaltoendopeptidase. 

Bone morphogenic protein 1 (BMP-1 ), a protein which induces cartilage and bone formation and which expresses 

metalloendopeptidase activity. The Drosophila homolog of BMP-1 is the dorsal-venfral patteming protein tolloid. 

Blastula protease 10 (BP10) from Paracentrotus lividus and the related protein SpAN from Strongylocentrotus 

purpuratus. 

Caenorhabditis elegans protein toh-2. 
Caenorhabditis elegans hypothetical protein F42A10.8. 

Choriolysins L and H (EC 3.4.24.67) (also known as embryonic hatching proteins LCE and HCE) from the fish 
Oryzias lapides. These proteases participates in the breakdown of the egg envelope, which is derived from the 
egg extracellular matrix, at the time of hatching. 

[1727] Family M12B 
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. snake venom metalloproteinases [6]. This subfamily mostly groups proteases that ^fJ''^^'"°''*'f^^-^^^J^^^^ 

Mmerelysin'l (EC 3.4.25.52) and II (EC 3.4.25.53). 
Mouse cell surface antigen MS2. 

[1728] Family Ml 3 

endopeptidase. 
- Peptidase O from Lactococcus lactis (gene pepO). 

[1729] Family M27 

Clostridial neurotoxins, including tetanus toxin (TeTx) and the various botulinum toxins (BoNT). These tojcins are 
■ SoSase^^^^^^^^ neu transmitter release by proteolytic cleavage of synaptic prote.ns such as synapto- 
brevins. syntaxin and SNAP-25 [7,8]. 

[1730] Family M30 

Staphylococcus hyicus neutral metalloprotease. 

[17311 Family M32 

- Thermostable carboxypeptidase 1 (EC 3.4.17.19) (carboxypeptidase Tag), an enzyme from Them^us aquat.cus 
which is most active at high temperature. 

[1732] Family M34 

- Lethal factor (LF) from Bacillus anthracis. one of the three proteins composing the anthrax toxin. 
[1733] Family M35 

- Deuterolysin (EC 3.4.24.39) from Penicillium citrinum and related proteases from various species of Aspergillus. 
[1734] Family M36 

- Extracellular elastinolytic metalloproteinases from Aspergillus. 

I1735I FromthetertiarystructureoftherTnolysin.thepositlonoftheresiduesactingasz^^^^^^^^^^^^ 
n the catatyttc activity are known. Two of the zinc ligands are histidines which are ^^l^-.^'^^^^^**;^:'";'; "^^^^ 
r ten^al to the firs^ histidine is a glutamic acid residue which acts as a nucleophile and promotes the attack of a 
Sa™L?u r^^^^^^ carb'on of the substrate. A signature pattern which includes the two h.st,d.ne and the 

glutamic acid residues is sufficient to detect this superfamily of proteins. 

two H's are zinc ligands] [E is the active site residue] 

Sequences known to belong to this class detected by the pattemALL, 

except for members of families M5, M7 amd M11 . 

Other sequence(s) detected in SWISS-PROT55: including Neurospora 

crassa conidiation-specific protein 13 which could be a 

zinc-protease. 

[ 1]Jongeneel C.V., Bouvier J., Bairoch A. 

FEBS Lett. 242:211-214(1989). 

[ 2]Murphy G.J.P., Murphy G.. Reynolds J.J. 
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FEBS Lett. 289:4-7(1991). 

[ 3]Bode W.. Grams R. Reinemer P., Gomis-Rueth F.-X., Baumann U.. McKay 

D.B.. Stoecker W. 

Zoology 99:237-246(1996). 
5 [ 4]Rawlings N.D., Barrett A.J. 

Meth. Enzymol. 248:183-228(1995). 

[ SJWoessner J. Jr. 

FASEB J. 5:2145-2154(1991). 

[ 6]Hlte L.A.. Fox J.W.. Bjarnason J.B. 
10 1 7]Montecucco C, Schiavo G. 

Trends Biochem. Sci. 18:324-327(1993). 

[8]Niemann H., Blasi J., Jahn R. 

Trends Cell Biol. 4:179-185(1994). 

15 [1737] 751. PseudoU_synt_1 

tRNA pseudouridine synthase is involved in the fomrtation of pseudouridine at the anticodon stem and loop of transfer- 
RNAs Pseudouridine is an isomer of uridine (5-(beta-D-ribofuranosyl) uracil, and id the most abundant modified nucl- 
eoside found in all cellular RNAs. The TruA-like proteins also exhibit a conserved sequence with a strictly conserved 
aspartic acid, likely involved in catalysis. Number of members: 25 

20 [1738] [1]Medline: 98254513. Transfer RNA-pseudouridine synthetase Pus1 of Saccaromyces cerevisiae contains 
one atom of zinc essential for its native conformation and tRNA recognition. Arluison V, Hountondji C. Robert B, Gros- 
jean H; Biochemistry 1998;37:7268-7276. 
[1739] 752. EPSP synthase signatures 

EPSP synthase (3-phosphoshikimate 1-carboxyvinyltransf erase) (EC 2.5.1.19) catalyzes the sixth step in the biosyn- 
25 thesis from chorismate of the aromatic amino acids (the shikimate pathway) in bacteria (gene aroA). plants and fungi 

(where it is part of a multifunctional enzyme which catalyzes five consecutive steps in this pathway) [1]. EPSP synthase 

has been extensively studied as it is the target of the potent herbicide glyphosate which inhibits the enzyme. 

[1 740] The sequence of EPSP from various biological sources shows that the structure of the enzyme has been well 

conserved throughout evolution. Two conserved regions were selected as signature patterns. The first pattern corre- 
30 spends to a region that is part of the active site and which is also important for the resistance to glyphosate [2]. The 

second pattem is located in the C-terminal part of the protein and contains a conserved lysine which seems to be 

important for the activity of the enzyme. 

[1741] Description of pattern(s) and/or profile(s) 

[1 742] Consensus pattem[LI VM]-x(2)-IGN]-N-[SA]-G-T-[STA]-x-R-x-lLI VM Y]-x-(GSTA] 
35 Consensus pattem[KR]-x-[KH]-E-[CSTl-[DNE]-R-[LIVM]-x-(STAl-[LIVMC]-x(2)-[EN]-[LIVMF]-x-[KRA]-[LIVMF]-G 

[ 1]Stallings W.C., Abdel-Megid S.S., Lim L.W., Shieh H.-S., Dayringer H.E., Leimgruber N.K., Stegeman R.A., 
Anderson K.S.. Sikorski J.A., Padgette S.R., Kishore G.M. Proc. Natl. Acad. Sci. U.S.A. 88:5046-5050(1991). 
[ 2]Padgette S.R., Re D.B.. Gaser C.S.. Eicholtz D.A.. Frazier R.B., Hironaka CM.. Levine E.B., Shah D.M., Fraley 
40 R,T.. Kishore G.M. J. Biol. Chem. 266:22364-22369(1991). 

[1 743] 753, Glyco_hydro_1 8 

Glycosyl hydrolases family 18. Number of members: 173 

[1]Medline: 95219379. Crystal structure of a bacterial chitinase at 2.3 A resolution. Perrakis A, Tews I, Dauter Z, Op- 
45 penheim AB, Chet I. Wilson KS. Vorgias CE; Structure 1994;2:1169-1180. 
[1744] 754. Esterase 
Putative esterase 

This family contains Esterase D Swiss:P 10768. However it is not clear if all members of the family have the same 
function. This family is possibly related to the COesterase family. 
50 Number of members: 36 

[1745] 755. (HMA) Heavy-metal-associated domain 

A conserved domain of about 30 amino acid residues has been found [1] in a number of proteins that transport or 
detoxify heavy metals. This domain contains two conserved cysteines that could be involved in the binding of these 
metals. The domain has been termed Heavy-Metal-Associated (HMA). It has been found in: 

55 

A variety of cation transport ATPases (E 1 -E2 ATPases) (see <PDOC00 1 39>). The human copper ATPAses ATP7A 
and ATP7B which are respectively involved in Menke's and Wilson's diseases. ATP7A and ATP7B both contain 6 
tandem copies of the HMA domain. The copper ATPases CCC2 from budding yeast, copA from Enterococcus 
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faecalis and synAfrom Synechococcus contain one copy of the HMA domain. The cadmium ATPases cadA from 

o,. « firmuc anrt fmm nlasmid dI258 from Staphylococcus aureus also contain a single HMA domain, while a 

S^romo'somai Stephytocti^us aureus cadA contains two copies. Other, less '"'^^'^'^'^^t^f^l^^Z!^'^^" 
the HMA domain are: fixl from Rhizobium meliloti. pacS from Synechococcus strain 7942^^ Mycobaden^^^ 
leprae ctpA and ctpB and Escherichia coli hypothetical protein yhhO. In all these ATPases the HMA domain(s) are 
located in the N-terminal section. _.^„i»«_„» 

. Mercuric reductase (EC 1 . 1 6. 1 .1 ) (gene merA) which is generally encoded by plasmids earned "y fnercun^^sistent 
cim-negative bacteria. Mercuric reductase is a class-1 pyridine nucleotide-disulph.de o'^'doreductase (see 
<PDOC00073>). There is generally one HMA domain (with the exception of a chromosomal merA from Bacillus 
sfrain RC607 which has two) in the N-terminal part of merA. 

. Mercuric transport protein periplasmic component (gene merP). also encoded by plasmids carried by mercury- 
resistant Gram^egative bacteria. It seems to be a mercury scavenger that specifically binds to HgP-) 
and which passes it to the mercuric reductase via the merT protein. The N-temiinal half of merP is a HMA domain. 
Helicobacter pylori copper-binding protein copP. 

- Yeast protein ATX1 [2], which could act in the transport and/or partitioning of copper. 

[1746] The consensus pattern for HMA spans the complete domain. 

[17471 Description of pattem(s) and/or proflle(s) ,o m/Ai „ nx/cvQi ixho 

Lnsensuspattern[LIVhQ-x(2HUVMFAhx-C-x-ISTAGCDNHl-C-x{3HLIVFGl-x(3HLiV]-x(9.11HI^^^^^^ 

two C's probably bind metals] 

[ llBull PC, Cox D.W. Trends Genet. 10:246-252(1994). 

[ 2]Lin S.-J., Culotta V.L. Proc. Natl. Acad. Sci. U.S.A. 92:3784-3788(1995). 

[1748] 756. (Peptidase M10) Mafrixins cysteine switch 

PROSITE cfoss-reference(s): CYSTEINE.SWITCH . . . ni ^Pnnrooi29>^ 

Mammalian exfracellular matrix metalloproteinases (EC 3.4.24.-). also known as mafrix.ns t 

are zinc-dependent enzymes. They are secreted by cells in an inactive fomi (zymogen) that differs from the mature 

rriymebythepresencrofanN-terminalpropeptide.Ahighly conserved octapeptideisfoundW^or^^ 

of the C-terminal end of the propeptide. This region has been shown to be involved .n autoinhibrtion of mafrixins [2 3] 

a cysteine within the octapeptide chelates the active site zinc ion. thus inhibiting the enzyme. This region has been 

called the 'cysteine switch' or 'autoinhibitor region*. 

A cysteine switch has been found in the following zinc proteases: 

- MMP-1 (EC 3.4.24.7) (interstitial collagenase). 

- MMP-2 (EC 3.4.24.24) (72 Kd gelatinase). 

- MMP-3 (EC 3.4.24.17) (stromelysln-1). 

- MMP-7 (EC 3.4.24.23) (matrilysin). 

MMP-8 (EC 3.4.24.34) (neutrophil collagenase). 

- MMP-9 (EC 3.4.24.35) (92 Kd gelatinase). 

- MMP-10 (EC 3.4.24.22) (stromelysin-2). 

- MMP-1 1 (EC 3.4.24.-) (stromelysin-3). 

MMP-1 2 (EC 3.4.24.65) (macrophage metalloelastase). 

- MMP-1 3 (EC 3.4.24.-) (collagenase 3). 

- MMP-14 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 1). 
MMP-1 5 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 2). 

- MMP-1 6 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 3). 

- Sea urchin hatching enzyme (EC 3.4.24.12) (envelysin) [4]. 
Chlamydomonas reinhardtii gamete lytic enzyme (GLE) [5]. 

[1749] Description of pattem(s) and/or profile(s) 

Consensus pattemP-R-C-[GN]-x-P-[DR]-[LIVSAPKQl [C chelates the zinc ion] 

[ 1]Woessner J. Jr. FASEB J. 5:2145-2154(1991). ^ « . . nv^^rr. or-^ hrq^ 11899 

[ 2]Sanchez-Lopez R.. Nicholson R., Gesnel M.C.. Matrisian L.M., Breathnach R. J. B.ol. Chem. 263.11892-11899 

1 3palk A.J.. Matrisian L.M.. Kells A.F.. Pearson R.. Yuan Z., Navre M. J. Biol. Chem. 266:1584-1590(1991). 
[ 41Lepage T. Gache C. EMBO J. 9:3003-3012(1990). 
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[ 5]Kinoshita Fukuzawa H., Shimada T.. Saito T. Matsuda Y. Proc. Natl. Acad. Sci. U.S.A. 89:4693-4697(1992). 
[1750] 757. (Peptidase S8) Serine proteases, subtilase family, active sites 

PROSITE cross-reference(s): PS00136; SUBTILASE^ASP. PS00137: SUBTILASE_HIS. PS00138: SUBTILASE_SER 
s Subtilases [1 .2] are an extensive family of serine proteases whose catalytic activity is provided by a charge relay system 
similar to that of the trypsin family of serine proteases but which evolved by independent convergent evolution. The 
sequence around the residues involved in the catalytic triad (aspartic acid, serine and histidine) are completely different 
from that of the analogous residues in the trypsin serine proteases and can be used as signatures specific to that 
category of proteases. 
10 The subtilase family cun-ently includes the following proteases: 

Subtilisins (EC 3.4.21 .62). these alkaline proteases from various Bacillus species have been the target of numerous 
studies in the past thirty years. 
Alkaline elastase YaB from Bacillus sp. (gene ale). 
15 - Alkaline serine exoprotease A from Vibrio alginolyticus (gene proA). 
Aqualysin I from Thenmus aquaticus (gene psti). 
AspA from Aeromonas salmonicida. 

Bacillopeptidase F (esterase) from Bacillus subtilis (gene bpO- 

C5A peptidase from Streptococcus pyogenes (gene scpA). 
20 . Cell envelope-located proteases PI, PH. and Pill from Lactococcus lactis. 

Extracellular serine protease from Serratia marcescens. 

Extracellular protease from Xanthomonas campestris. 

Intracellular serine protease (ISP) from various Bacillus. 

Minor extracellular serine protease epr from Bacillus subtilis (gene epr). 
25 - Minor extracellular serine protease vpr from Bacillus subtilis (gene vpr). 

Nisin leader peptide processing protease nisP from Lactococcus lactis. 

Serotype-specific antigene 1 from Pasteurella haemolytica (gene ssal). 

Thermitase (EC 3.4.21.66) from Thermoactinomyces vulgaris. 

Calcium-dependent protease from Anabaena variabilis (gene prcA). 
30 - Halolysin from halophilic bacteria sp. 172p1 (gene hly). 

Alkaline extracellular protease (AEP) from Yan-owia lipolytica (gene xpr2). 

Alkaline proteinase from Cephalosporium acremonium (gene alp). 

Cerevisin (EC 3.4.21.48) (vacuolar protease B) from yeast (gene PRB1). 

Cuticle-degrading protease (pri) from Metarhizium anisopliae. 
35 - KEX-1 protease from Kluyveromyces lactis. 

- Kexin (EC 3.4.21 .61 ) from yeast (gene KEX-2). 

Oryzin (EC 3.4.21.63) (alkaline proteinase) from Aspergillus (gene alp). 

Proteinase K (EC 3.4.21.64) from Tritirachium album (gene proK). 

Proteinase R from Tritirachium album (gene proR). 
^ - Proteinase T from Tritirachium album (gene proT). 

Subtilisin-like protease 111 from yeast (gene YSP3). 

Thermomycolin (EC 3.4.21.65) from Malbranchea sulfurea. 

Furin (EC 3.4.21 .85), neuroendocrine convertases 1 to 3 (NEC-1 to -3) and PACE4 protease from mammals, other 
vertebrates, and invertebrates. These proteases are involved in the processing of hormone precursors at sites 
45 comprised of pairs of basic amino acid residues [3]. 

Tripeptidyl-peptidase II (EC 3.4.14.10) (tripeptidyl aminopeptidase) from Human. 

Prestalk-specific proteins tagB and tagC from slime mold [4]. Both proteins consist of two domains: a N-terminal 
subtilase catalytic domain and a C-terminal ABC transporter domain (see <PDOC00185>). 

50 [1751] Description of pattem(s) and/or profile(s) 

Consensus pattern[STAIV]-x-[LIVMFl-[LlVM]-D-[DSTA]-G-[LIVMFC]-x(2,3)-[DNH] [D is the active site residue] 
Consensus pattemH-G-[STM]-x-[VlC)-[STAGC]-[GS]-x-[LIVMA]-[STAGCLV]-[SAGM) [H is the active site residue] 
Consensus pattemG-T-S-x-[SAl-x-P-x(2)-{STAVC]-[AG] [S is the active site residue] 

Note if a protein includes at least two of the three active site signatures, the probability of it being a serine protease 
55 from the subtilase family is 100% 

Note these proteins belong to family S8 in the classification of peptidases [5.E1]. 

[ 1]Siezen R.J.. de Vos W.M., Leunissen J.A.M.. Dijkstra B.W. Protein Eng. 4:719-737(1991). 
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[2]Siezen R.J. (In) Proceeding subtilisin symposium, Hamburg, (1992). 
' S^Barr P.J. Gei! 66:1-3(1^^^^1) 

[4]Shaulsky G.. Kuspa X.. Loomis W.F.; Genes Dev. 9:1111-1122(1995). 
[ SJRawlings N.D.. Barrett A.J. Meth. Enzymol. 244:19-61(1994). 

[1752] 758. (SSB) Single-strand binding protein family signatures 
PROSITEcross-reference(s):PS00735;SSB_1.PS00736;SSB_2 

The Escherichia coii single-strand binding protein [1] (gene ssb), also known as the hel.x^estab.l.z.ng Protein 'S a 
protein of 177 amino acids. It binds tightly, as a homotetramer. to single-stranded DNA (ss-DNA) and plays an important 
role in DNA replication, recombination and repair. «,^<.r«iHo 
[17531 Closely related variants of SSB are encoded in the genome of a variety of large self-transm.ss.ble plasmids. 
SSB has also been characterized in bacteria such as Proteus mirabilis or Serratia marcescens. 
[1 754] Eukaryotic mitochondrial proteins that bind ss-DNA and are probably involved m '"'tochondnal DNA re^^^^ 
are structurally and evolutionary related to prokaryotic SSB. Proteins currently known to belong to this subfamily are 
listed below [2]. 

- Mammalian protein Mt-SSB (P16). 

- Xenopus Mt-SSBs and Mt-SSBr. 

- Drosophila MtSSB. 
Yeast protein RIM1 . 

[17551 Two signature patterns have been developed for these proteins. The first is a conserved region in the N- 
terminal section of the SSB's. The second is a centrally located region which. In Escherichia coli SSB. .s known to be 
involved in the binding of DNA. 

[1 7561 Description of pattem(s) and/or profile(s) 

Consensus pattern[LIVMF]-[NST|-[KRTl-[LIVM]-x-[LIVMn(2)-G-[NHRK]-[LIVM]-[GSTl-x^ 

Consensus pattemT-x-W-[HYl-[RNSl-[LIVM]-x-[LIVMF]-[FY]-[NGKR] 

[ 1]Meyer R.R.. Laine PS. Microbiol. Rev. 54:342-380(1990). 
[ 2]Stroumbakis N.D.. Li Z.. Tolias PP Gene 143:171-177(1994). 

[17571 759. KDPG and KHG aldolases active site signatures 

PROSITE cross-reference(s): PS00159: ALDOLASE.KDPG^KHG^I. PS00160; ALDOLASE_KDPG_KHG 2 
[1758] 4-hydroxy-2-oxoglutaratealdolase(EC4.1.3.16)(KHG-aldolase)catalyzestheint^^^^^^ 
2-oxoglutarate into pyruvate and glyoxylate. Phospho-2.dehydro-3-deoxygluconate aldolase (EC 4.1.2^14) (KDPG- 
aldolTse) catalyzes the interconversion of 6-phospho-2-dehydro.3-deoxy-D.gluconate into pyruvate and glyceralde- 

futgr^T^^^^^^^^ enzymes are structurally and functionally related [1]. They are both homotrimeric proteins of ap- 
proximately 220 amino-acid residues. They are class I aldolases whose catalytic mechanism involves the format.^^^ 
of a Schiff-base intermediate between the substrate and the epsilon-amino group of a lysine residue. In both enzymes, 
an arginine is required for catalytic activity. . . „.«;r.™ «r.H 

[1760] Two signature pattems were developed for these enzymes. The first one contains the active srte arg.nine and 
the second, the lysine involved in the Schiff-base formation. 
[1761] Description of pattem(s) and/or profile(s) 

Consensus patternG-[LIVM].x(3)-E-[LIV]-T-[LF]-R [R is the active site residue] Consensus pattern G-x(3)-ILIVM F]-K- 

[LF]-F-P-[SA]-x(3)-G [K is involved in Schiff-base fonnation] 

[1762] [ 1] Vlahos C J., Dekker E.E. J. Biol. Chem. 263:11683-11691(1988). 

[1763] 760 AP endonucleases family 1 signatures. PROSITE cross-reference(s): PS00726; 

AP_NUCLEASE_F1_1, PS00727; AP_NUCLEASE_F1_2. PS00728; 

MTWl^^DN^^^ agents such as the antitumor drugs bleomycin and neocarzinostatin or those that generate 

oxygen radicals produce a variety of lesions in DNA. Amongst these is base-loss which fomis apunnic/apynm.din.c 
(AP? sites or strand breaks with atypical 3nermini. DNA repair at the AP sites is initiated by specific endonuclease 
cleavage of the phosphodiester backbone. Such endonucleases are also generally capable of removing blocking 
groups from the 3tenninus of DNA strand breaks. . ., i «r«..r.e 

[1765] AP endonucleases can be classified into two families on the basis of sequence similarity. Family 1 groups 
the enzymes listed below [1]. 
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Escherichia coli exonuclease III (EC 3.1.11.2) (gene xthA). 

Streptococcus pneumoniae and Bacillus subtilis exonuclease A (gene exoA). 

Mammalian AP endonuclease 1 (API) (EC 4.2.99.18). 

Drosophila recombination repair protein 1 (gene Rrp1). 

Arabidopsis thaliana apurinic endonuclease-redox protein (gene arp). 

[1766] Except for Rrp1 and arp, these enzymes are proteins of about 300 amino-acid residues. Rrp1 and arp both 
contain additional and unrelated sequences in their N-tenminal section (about 400 residues for Rrp1 and 270 for arp). 
[1767] Three signature patterns were developed for this family of enzymes. The patterns are based on the most 
conserved regions. The first pattern contains a glutamate which has been shown [2]. in the Escherichia coli enzyme 
to bind a divalent metal ion such as magnesium or manganese 

[1768] Consensus pattemIAPF]-D-[LIVMF](2)-x-[LIVM]-Q-E-x-K [E binds a divalent metal ion] 

Consensus patternD-[ST]-[FY]-R-[KH]-x(7,8)-[FYW]-[STI-(FYW](2) 
Consensus patternN-x-G-x-R-[LIVM]-D-[LIVMFYHl-x-(LV]-x-S 

[ 1] Barzilay G.. Hickson I.S. BioEssays 17:713-719(1995). 

[ 2] Mol CD.. Kuo C.-F.. Thayer M.M., Cunningham R.P.. Tainer J.A. Nature 374:381-386(1995). 

[1769] 761. (ER)Enhancer of rudimentary signature, PROSITE cross-reference(s): PS01290; ER 

[1 770] The Drosophila protein 'enhancer of rudimentary' (gene (e(r)) is a small protein of 1 04 residues whose function 

Is not yet clear. From an evolutionary point of view, it is highly conserved [1] and has been found to exist in probably 

all multicellular eukaryotic organisms. It has been proposed that this protein plays a role in the cell cycle. 

[1771] A conserved region in the central part of the protein was selected as as signaure pattern. 

[1 772] Consensus pattem Y-D-l-[SA]-x-L-[FY]-x-F-[IV]-D-x(3)-D-[LIV]-S 

[1773] [ 1] Gelsthorpe M., Pulumati M., McCallum C, Dang-Vu K.. Tsubota S.I. Gene 186:189-195(1997). 

[1774] 762. (ETF alpha) Electron transfer flavoprotein alpha-subunit signature. PROSITE cross-reference(s): 

PS00696; ETF_ALPHA 

[1775] The electron transfer flavoprotein (ETF) [1 ,2] serves as a specific electron acceptor for various mitochondrial 
dehydrogenases. ETF transfers electrons to the main respiratory chain via ETF-ubiquinone oxidoreductase. ETF is an 
heterodimer that consist of an alpha and a beta subunit and which bind one molecule of FAD per dimer A similar 
system also exists in some bacteria. 

[1776] The alpha subunit of ETF is a protein of about 32 Kd which is structurally related to the bacterial nitrogen 
fixation protein fixB which could play a role in a redox process and feed electrons to ferredoxin. 
[1777] Other related proteins are: 

Escherichia coli hypothetical protein ydiR. 
Escherichia coli hypothetical protein ygcQ. 

[1778] A highly conserved region which is located in the C-terminal section was selected as a signature pattern for 
these proteins. 

[1 779] Consensus pattern [LI]-Y-[LI VM]-[AT|-x-G-[l\/]-[SD]-G-x-[l V]-Q-H-x(2)-G-x(6)-[l Vl-x-A-[l V]-N 

[ 1] Finocchiaro G., Ikeda Y., Ito M., Tanaka K. Prog. Clin. Biol. Res. 321:637-652(1990). 
[2] Tsai M.H., Saier M.H. Jr. Res. Microbiol. 146:397-404(1995). 

[1780] 763. (lectin c) C-type lectin domain signature and profile 

PKOSITE cross-reference(s): PS00615; C_TYPE_LECTIN_1. PS50041: C_TYPE_LECTIN_2 
[1781] A number of different families of proteins share a conserved domain which was first characterized in some 
animal lectins and which seem to function as a calcium-dependent carbohydrate-recognition domain [1,2,3]. This do- 
main, which is known as the C-type lectin domain (CTL) or as the carbohydrate-recognition domain (CRD), consists 
of about 110 to 130 residues. There are four cysteines which are perfectly conserved and invoh^ed in two disulfide 
bonds. A schematic representation of the CTL domain is shown below. 



271 



EP 1 033 405 A2 



I I 

xcxxxxcxxxxxxxCxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxCxxxxWxCx 
xxxCx 

+.„.+ , + 



'C: conserved cysteine involved in a disulfide bond, 
'c': optional cysteine involved in a disulfide bond, 
position of the pattern. 

[17821 The categories of proteins, in which the CTL domain has been found, are listed below. 

[ItSS Ty^^ membrane proteins where the CTL domain is located at the C-tem,.nal extremity of the proteins. 

AsialoQlycoprotein receptors (ASGPR) (also known as hepatic lectins) [4]. The ASGPR's mediate the ^ndocytosis 

SrlaXCoteinstowhchtheterminalsia^ 
. uLS S imm'unoglobulin epsllon Fc receptor (lymphocyte IgE receptor), which plays an essential role .n the 

rMulation of laE oroduction and in the differentiation of B cells. 
. c^lfrS^or. A .Bceptor with an afHnity ^^^alaotose and fucose, tha^ cou^^^^^^^^^ 

. A number of proteins expressed on the surface of natural killer T-ceHs NKG2, N'<R-P1. YEI^^^^ 

and on B-cells: CD72. LyB-2. The CTL- domain in these proteins is distantly related to other CTL-<Jomains. it is 

unclear whether they are likely to bind carbohydrates. 
[1784] Proteins that consist of an N-temiinal collagenous domain followed by a CTL- domain [5], these proteins are 
sometinnes called 'collectins': 

lung. 

plexes through the complement component (iC3b). 
. Mannan-binding proteins (MBP) (also known as mannose-binding proteins). 
MBP's bind mannose and N-acetyl-D-glucosamine in a calaum-dependent 
manner. 

- Bovine collectin-43 (CL-43). 

ri7851 Selectins (or LEC-CAM) [6.7]. Selectins are cell adhesion molecules implicated in the interaction of leukoc^es 
Xlate1e?Jr vls^ifar endoieliuln. Structurally, selectins consist of a long 

fransmembrane region and a short cytoplasmic domain. The extracellular domain .s itself composed of a CTL^oma.n, 
flowed bran eS-like domain anTa variable number of SCR/Sushi repeats. Known selectins are: 

- Lymph node homing receptor (also known as L-selectin, leukocyte adheskjn 
molecule-1. (LAM-1), leu-8. gp90-mel. or LECAM-1) 
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Endothelial leukocyte adhesion molecule 1 (ELAM-1 . E-selectin or LECAM-2). 

[1786] The ligand recognized by ELAM-1 is sialyl-Lewis x. 

5 - Granule membrane protein 140 (GMP-140, P-selectin. PADGEM, CD62. or LECAM- 
3). The ligand recognized by GMP-140 is Lewis x. 

[1787] Large proteoglycans that contain a CTL-domain followed by one copy of a SCR/ Sushi repeat, in their C- 
terminat section: 

10 

Aggrecan (cartilage-specific proteoglycan core protein). This proteoglycan is a major component of the extracellular 

matrix of cartilagenous tissues where It has a role in the resistance to compression. 

Brevican. 

Neurocan. 

15 - Versican (large fibroblast proteoglycan), a large chondroifin sulfate proteoglycan that may play a role in intercellular 

signalling. 

[1788] In addition to the CTL and Sushi domains, these proteins also contain, in their N-terminal domain, an Ig-like 
V-type region, two or four link domains (see <PDOC00955>) and up to two EGF-like repeats. 
20 [1789] Two type-l membrane proteins: 

Mannose receptor from macrophages. This protein mediates the endocytosis of 

glycoproteins by macrophages in several recognition and uptake processes. Its extracellular section consists of a 
fibronectin type II domain followed by eight tandem repeats of the CTL domain. 
25 - 180 Kd secretory phospholipase A2 receptor (PLA2-R). A protein whose 
structure is highly similar to that of the mannose receptor. 

DEC-205 receptor. This protein is used by dendritic cells and thymic epithelial cells to capture and endocytose 
diverse carbohydrate-binding antigens and direct them to antigen-processing cellular compartiments. DEC-205 
extracellular section consists of a fibronectin type II domain followed by ten tandem repeats of the CTL domain. 
30 - Silk moth hemocytin, an humoral lectin which Is involved in a self-defence mechanism. It is composed of 2 FA58C 
domains (see <PDOC00988>), a CTL 

domain, 2 VWFC domains (see <PDOC00928), and a CTCK (see <PDOC00912>). 
[1790] Various other proteins that uniquely consist of a CTL domain: 

35 

Invertebrate soluble galactose-binding lectins. A category to which belong a humoral lectin from a flesh fly; echi- 
noidin, a lectin from the coelomic fluid of a sea urchin; BRA-2 and BRA-3, two lectins from the coelomic fluid of a 
barnacle, a lectin from the tunicate Polyandrocarpa misakiensis and a newt oviduct lectin. The physiological im- 
portance of these lectins is not yet known but they may play an important role in defense mechanisms. 
40 - Pancreatic stone protein (PSP) (also known as pancreatic thread protein (PTP). or reg), a protein that might act 
as an inhibitor of spontaneous calcium carbonate precipitation. 

Pancreatitis associated protein (PAP), a protein that might be involved in the control of bacterial proliferation. 
Tetranectin, a plasma protein that binds to plasminogen and to isolated kringle 4. 
Eosinophil granule major basic protein (MBP), a cytotoxic protein. 
45 m A galactose specific lectin from a rattlesnake. 

Two subunits of a coagulation factor IX/factor X-binding protein (IX/X-bp), a snake venom anticoagulant protein 
which binds with factors IX and X in the presence of calcium. 

Two subunits of a phospholipase A2 inhibitor from the plasma of a snake (PLI-A and PLI-B). 
A lipopolysaccharide-binding protein (LPS-BP) from the hemolymph of a 
50 cockroach [8]. 

Sea raven antifreeze protein (AFP) [9]. 

[1 791 ] As a signature pattern for this domain, the C-terminal region with its three conserved cysteines was selected. 
[1 792] Consensus pattemC-[LIVMFYATGl-x(5.12).[WL].x-[DNSRl-x(2)-C-x(5.6)-[FYWLIVSTA]-[LIVMSTA]-C [The 
55 three C's are involved in disulfide bonds] 

Note all CTL domains have five Trp residues before the second Cys. with the exception of tunicate lectin and cockroach 
LPS-BP which have Leu. 

Note this documentation entry is linked to both a signature pattem and a profile. As the profile is much more sensitive 
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than the pattern, you should use it if you have access to the necessary software tools to do so. 

( 1] Drickamer K. J. Biol. Chem. 263:9557-9560(1988). 
2 Drickamer K. Prog. Nucleic Acid Res. IVlol. Biol. 45:207-232(1993). 
13] DrickamerK. Curr. Opin. Struct. Biol. 3:393-400(1993). 

[41 Spiess M. Biochemistry 29:10009-10018(1990). o^.tMH ifiiSflQai^ 

[% Weis W.I.. Kahn R.. Fourme R., Drlckarrier K.. Hendrickson W.A. Science 254.1608-1615(1991). 
[6] Siegelman M. Curr. Biol. 1:125-128(1991). 
[ 71 Lasky L.A. Science 238:964-969(1992). 
8 Jomori T.. Natori S. J. Biol. Cheiti. 266:13318-13323(1991). 
[ 9] Ng N.F.L., Hew C.-L. J. Btol. Chem. 267:16069-16075(1992). 

[1793] 764 (SRCR) Speract receptor repeated domain signature 

The Signature pattern that was derived spans part of the N-terminal section of the domain and contains 8 of 
iryirSn^Tnsr^a^^^^^^^ 

Acad. Sci. U.S.A. 87:8810-8814(1990). 
[1798] 765. Bac_surface_Ag 

?hTeZ'rnc^^ste'^^^^^^^^ surface antigens; D15 antigen from H.influenzae. OMA87 from P.multocida. OMP85 
from N.meningitidis and N. gonorrhoeae. Number of members: 14 

[1]Medllne: 95255676. The sequencing of the 80-kDa D1 5 protective surface antigen of Haemophilus influenzae. 

pilutTa mulScida 87-kil,Sa«on outer membrane antigen. Ruffok, CG. Adier B; Infect Immun 1996.64. 
3161-3167. 

This suggests that pairs of BRCT domains 

associate as homo- or heterodimers. Number of members: 131 

[1] ly^edline: 96259550. BRCA1 protein products ...Functional motifs... Koonin EV. Altschul SF. Bork P; Nature 
f "M^r' 971M2??From BRCA1 to RAP1: A widespread BRCT module closely associated with DNA repair 
f3rSSni: « 7^:^:^^. domains in ONA damage respon- c.^^^^^^^ checkpoint 

fq^Me^ne": 99016060. Structure of an XRCC1 BRCT domain: a new protein-protein interaction module. Zhang 
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X. Morera S, Bates PA, Whitehead PC, Coffer Al, Hainbucher K, Nash RA, Sternberg MJ, Lindahl T, Freemont PS; 
[1800] 767. Kappa casein 

Kappa-casein is a mammalian milic protein involved in a number of Important physiological processes, tn the gut, the 
5 ingested protein is split Into an insoluble peptide (para kappa-casein) and a soluble hydrophiiic glycopeptide (casei- 
nomacropeptide). 

Caseinomacropeptide is responsible for increased efficiency of digestion, prevention of neonate hypersensitivity to 
ingested proteins, and inhibition of gastric pathogens. Number of members: 56 

[1801] [1] Medline: 98072500. Nucleotide sequence evolution at the kappa-casein locus: evidence for positive se- 
10 lection within the family Bovidae. Ward TJ, Honeycutl RL, Derr JN; Genetics 1997;147:1863-1872, 
[1802] 768. Chitinases family 18 active site 
PROSITE cross-reference(s) CHITINASE_18 

Chitinases (EC 3.2.1.14) [1] are enzymes that catalyze the hydrolysis of the beta-1,4-N-acetyl-D-glucosamine linkages 
in chitin polymers. From the view point of sequence similarity chitinases belong to either family 18 or 19 in the classi- 
fy ficationof glycosyl hydrolases [2,E1]. Chitinasesof family 18 (also known as classes III orV)groupsa variety of proteins: 

a) Chitinases from: 

Prokaryotes such as Alteromonas, Bacillus, Serratia, Streptomyces. etc. 
20 - Plants such as Arabidopsis, cucumber, bean, tobacco, etc. 

Fungi such as Aphanocladium, Rhizopus. Saccharomyces, etc. 
Nematode (Brugia malayi). 
Insects (Manduca sexta). 

Baculoviruses (Autographa Californica Nuclear Polyhedrosis virus). 

25 

b) Other proteins: 

Hevamine. a rubber tree protein with chitinase and lysozyme activities. 
Kluyveromyces lactis kilter toxin alpha subunit, which acts as a chitinase. 
30 - Flavobacterium and Streptomyces endo-beta-N-acetylglucosaminidases (EC 3.2.1.96). 

Mammalian di-N-acetylchitobiase which is involved In the degradation of asparagine-linked glycoproteins. 
Human cartilage glycoprotein Gp-39. 

Jack bean concanavalin B (conB), a protein that has lost its catalytic activity. 

35 [1 803] Site directed mutagenesis experiments [3] and crystallographic data [4,5] have shown that a conserved giuta- 
mate is involved in the catalytic mechanism and probably acts as a proton donor. This glutamate Is at the extremity of 
the best conserved region in these proteins. 

[1804] Consensus pattem[LlVMFY]-[DN]-G-[LIVMF]-[DN]-[LIVMF]-[DN]-x-E [E is the active site residue] 

40 [ 1] Flach J., Pilot P.-E.. Jolles R Experientia 48:701-716(1992). 

[2] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 3] Watanabe T.. Kohori K., Miyashita K., Fujii T., Sakai H., Uchida M., Tanaka H. J. Biol. Chem. 268:18567-18572 
(1993). 

[4] Perrakis A., Tews I., Dauter Z., Oppenheim A.B., Chet I., Wilson K.S., Vorgias C.E. Structure 2:1169-1180 
45 (1994). 

[5] van Scheltinga A.C.T., Kalk K.H., Beintema J.J., Dijkstra B.W. Structure 2:1181-1189(1994). 
[1805] 769. gag_p17. gag gene protein pi 7 (matrix protein). 

The matrix protein forms an Icosahedral shell associated with the Inner membrane of the mature immunodeficiency 

50 virus. Number of members: 1598 

[1806] [1] Medline: 95055757. Three-dimensional structure of the human immunodeficiency virus type 1 matrix pro- 
tein. Massiah MA, Starich MR, Paschall C, Summers MF. Christensen AM, Sundquist Wl; J Mol Biol 1994;244:198-223. 
[1807] 770. GDA1/CD39 family of nucleoside phosphatases signature 
PROSITE cross-reference(s); GDA1_CD39^NTPASE 

55 A number of nucleoside diphosphate and triphosphate hydrolases as well as some yet uncharacterized proteins have 
been found to belong to the same temily [1 . 2]. This family currently consist of: 

Yeast guanosine-diphosphatase (EC 3.6.1.42) (GDPase) (gene GDA1). GDA1 is a golgi integral membrane en- 
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zyme that catalyzes the hydrolysis of GDP to GMP. ^ .k at-d or,H Anptr,nrnrtiirP 

Potato apyrase (EC 3 « 1 S) (adenosine diphosohatase) (ADPase). Apyrase acts on both ATP and ADP to produce 

Mammalian vascular ATP-diphosphohydrolase (EC 3.6.1 .5) (also known as lymphoid cell activation antigen CD39)^ 
Toxoplasma gondii nucleoside-triphosphatases (EC 3.6.1.15) (NTPase). NTPase hydrolyses vanous nu^^s,de 
triphosphates to produce the corresponding nucleoside mono- and diphosphates. The enzyme is secreted nto 
S^invaded host cell into the parasitophorousvacuole.aspecializedcompartmentwhere the parasrtemtracellula^ 

resides. 

- Pea nucleoside-triphosphatases (EC 3.6.1 .1 5) (NTPase). 

- Caenorhabditis elegans hypothetical protein C33H5.14. 

- Caenorhabditis elegans hypothetical protein R07E4.4. 

- Yeast chromosome V hypothetical protein YEROOSvi^. 

[18081 The above uncharacterized proteins all seem to be membrane-bound. k»w» k»«.o 

[18091 All these proteins share a number of conserved domains. The best conserved of these domains have been 
selected. It Is located in the central section of the proteins. 

[1810] Consensus pattemILlVM]-x-G-x(2)-E-G-x-[FY]-x-IFWl-[LIVA]-rrAG]-x-N-IHY] 

[ 11 Handa M.. Guidotti G. Biochem. Biophys. Res. Commun. 218:916-923(1996). w„„„,.,«,^ m 

[ 2] Vasconceios E.G.. Ferreira S.T.. de Carvalho T.M.U.. de Souza W.. Kettlun A.M.. Manclla M.. Valenzuela M. 
A.. Verjovski-Almelda S. J. Biol. Chem. 271:22139-22145(1996). 

[18111 771 . GTP cyclohydrolase I signatures . „ i /cr- 

PROSITE cross-referencis); GTP_CYCLOHYDROL_1_1, GTP_CYCLOHYDROL_1_2 GTP cyclohydrolase 1 (EC 
3 5 4 16) catalyzes the biosynthesis of fomiic acid and dihydroneopterin triphosphate from GTP This react-on .s the 
filst step in the biosynthesis of tetrahydrofolate in prokaryotes. of tetrahydrobiopterin m vertebrates, and of ptend.ne- 

S2r"GTp'^d?hydroS^^^^^ is a protein of from 190 to 250 amino acid residues. The comparison of the sequence 
of the enSme from bacterial and eukaryotic sources shows that the structure of this enzyme has been extremely well 

n8Tr?rSnse°;^^^^^^^^^^ 

tide which is part of the GTP-blnding pocket [21, the second region also contains consen/ed residues involved in GTP- 

[1814?' Consensus pattem[DEN]-[LIVM]{2)-x(2)-[KRNQ]-[DEN]-[LIVM]-x(3)-[ST|-x-C-E- H-H 
Consensus pattem[SAl-x-[RK]-x-Q-[LIVM]-Q-E-IRN]-[Lll-[TSNl 

[ 1] Maier J., Witter K.. Guetlich M.. Ziegler I., Werner T, Ninnemann H. Biochem. Biophys. Res. Commun. 212: 

[ g^Nar H?Huber R.. Meining W.. Schmid C. Weinkauf S.. Bacher A. Structure 3:459-466(1995). 

[18151 772. llvC. Acetohydroxy acid isomeroreductase 

Xcetohydroxy acid isomeroreductase catalyses the conversion of acetohydroxy acids into dUiydroxy valerates Th« 
reaction is the second in the synthetic pathway of the essential branched side chain ammo acds valine and isoleucine. 

1181?' MrMedHn'J: 97361822. The crystal structure of plant acetohydroxy acid isomeroreductase complexed with 
NADPH two magnesium ions and a herbicidal transition state analog detemiined at 1 .65 A resolution. Biou V. Dumas 
R, Cohen-Addad C, Douce R, Job D, Pebay-Peyroula E; EMBO J 1997;16:3405-3415. 
[1817] 773. Prokaryotic membrane lipoprotein lipid attachment site 
PROSITE cross-reference(s); PROKAR_LIPOPROTE1N 

in prokaryotes. membrane lipoproteins are synthesized with a precursor signal peptide which is cleaved by a specie 
ipoprotein signal peptidase (signal peptidase II). The peptidase recognizes a conserved sequence and cuts upstream 
o?a cysteine residue to whi<* a glyceride-fatty acid lipid is attached [1]. Some of the proteins known to undergo such 
processing cuoently include (for recent listings see [1 .2.3]): 

- Major outer membrane lipoprotein (murein-lipoproteins) (gene Ipp). 
Escherichia coll lipoprotein-28 (gene nlpA). 
Escherichia coll lipoprotein-34 (gene nlpB). 
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Escherichia coli lipoprotein nIpC. 
Escherichia coli lipoprotein nIpD. 

Escherichia coli osmotically inducible lipoprotein B (gene osmB). 
Escherichia coli osmotically inducible lipoprotein E (gene osmE). 
Escherichia coli peptidoglycan-associated lipoprotein (gene pal). 
Escherichia coli rare lipoproteins A and B (genes rplA and rplB). 
Escherichia coli copper homeostasis protein cutF (or nIpE). 
Escherichia coli plasmids traT proteins. 
Escherichia coli Col plasmids lysis proteins. 
A number of Bacillus beta-lactamases. 

Bacillus subtilis periplasmic oligopeptide-binding protein (gene oppA). 
Borrelia burgdorferi outer surface proteins A and B (genes ospA and ospB). 
Borrelia hermsii variable major protein 21 (gene vmp21 ) and 7 (gene vmp7). 
Chlamydia trachomatis outer membrane protein 3 (gene omp3). 
Fibrobacter succlnogenes endoglucanase ceI-3. 
Haemophilus influenzae proteins Pal and Pep. 
Klebsiella pullulunase (gene pulA). 
Klebsiella pullulunase secretion protein pulS. 
Mycoplasma hyorhinis protein p37. 

Mycoplasma hyorhinis variant surface antigens A, B, and C (genes vlpABC). 
Neisseria outer membrane protein H.8. 
Pseudomonas aeruginosa lipopeptide (gene IppL). 
Pseudomonas solanacearum endoglucanase egl. 

Rhodopseudomonas viridis reaction center cytochrome subunit (gene cytC). 
Rickettsia 17 Kd antigen. 

Shigella flexneri invasion plasmid proteins mxiJ and mxIM. 

Streptococcus pneumoniae oligopeptide transport protein A (gene amiA). 

Treponema pallidium 34 Kd antigen. 

Treponema pallidium membrane protein A (gene tmpA). 

Vibrio harveyi chitobiase (gene chb). 

Yersinia virulence plasmid protein yscJ. 

Halocyanin from Natrobacterium pharaonis [4], a membrane associated copper- binding protein. This is the first 
archaebacterial protein known to be modified in such a fashion). 

[1818] From the precursor sequences of all these proteins, we derived a consensus pattem and a set of rules to 
identify this type of post-translational modification. 

[1819] Consensus pattern{DERK}(6)-[LIVMFWSTAG]{2)-[LIVMFYSTAGCQ]-IAGS]-C [C is the lipid attachment site] 
Additional rules: 1) The cysteine must be between positions 15 and 35 of the sequence in consideration. 2) There must 
be at least one Lys or one Arg in the first seven positions of the sequence. 

[ 1] Hayashi S., Wu H.C. J. Bioenerg. Biomembr. 22:451-471(1990). 
[ 2]Klein P.. Somorjai R.L., Lau RC.K. Protein Eng. 2:15-20(1988). 
[ 3Jvon Heijne G. Protein Eng. 2:531-534(1989). 

[ 4]Mattar S.. Scharf B.. Kent S.B.H.. Rodewald K.. Oesterhelt D., Engelhard M. J. Biol. Chem. 269:14939-14945 
(1994). 

[1820] 774. Aminoacyl-transfer RNA synthetases class-ll signatures 

PROSITEcross-reference(s); AA_TRNA_LIGASE_IL1; AA_TRNA_LIGASE_IL2 Aminoacyl-tRNA synthetases (EC 
6.1.1.-) [1] are a group of enzymes which activate amino acids and transfer them to specific tRNA molecules as the 
first step In protein biosynthesis. In prokaryotic organisms there are at least twenty different types of aminoacyl-tRNA 
synthetases, one for each different amino acid. In eukaryotes there are generally two aminoacyl-tRNA synthetases for 
each different amino acid: one cytosolic form and a mitochondrial form. While all these enzymes have a common 
function, they are widely diverse in terms of subunit size and of quatemary structure. 

[1821] The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, pro- 
line, serine, and threonine are referred to as class-ll synthetases [2 to 6] and probably have a common folding pattem 
in their catalytic domain for the binding of ATP and amino acid which is different to the Rossmann fold observed for 
the class I synthetases [7]. 
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[18221 Class-ll tRNA synthetases do not share a high degree of similarity, however at least three conserved regions 

r,r==ont 12 R m f?ionature oatterns from t\wo of these regions have been derived. 
[1823]' Consensus pattern[FYHl-R-x-[DE]-x(4.12)-[RHhx(3)-F-x(3HDE] ,„„™pi f.lVMFYl 

Consensus pattern[GSTALVn^DENQHRKPHGSTAHLIViy/IF]-[DE]-R-[LIVIVIF]-x-[LIVMSTAG]-[LIVI^ 

[ 1]Schimmel P. Annu. Rev. Biochem. 56:125-158(1987). 

[2]Delarue M.. Moras D. BioEssays 15:675-687(1993). 

[3]Schimmel P. Trends Biochem. Sci. 16:1-3(1991). „,^^,^aa^^ 

4 Nagel G.M.. Doolittle R.F. Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991^ 
[ 5]Cusack S.. Haertlein M.. Leberman R. Nucleic Acids Res. 19:3489-3498(1991). 
[ eiCusack S. Biochimie 75:1077-1081(1993). „ k. , ■jat -jaq 9«;«;fiqq01 

7^Cusack S.. Berthet-Colominas C. Haertlein M.. Nassar N.. Lebemian R^^^at^^ 347^249-255(1990). 
[ 8]Leveque P.. Plateau P.. Dessen P.. Blanquet S. Nucleic Acids Res. 18:305-312(1990). 

[1824] 775. X. Trans-activation protein X „»„h«.r=- qi 

Vhis protein is found in hepadnaviruses where it is indispensable for repl.cat.on. Number of members. 91 

RJSl 7hSl'^lr^^^^^^^^ SZX^^^ catalyzes the reductive methy.ation of dUMP .o dTMP with 
SntSreSS; o?llO-meiy1enetetrahid,ofolate to dihydrofolate. Thymidylate synthase plays an essential role 

in nNA qvnthesis and is an important target for certain chemotherapeutic drugs. 

[1 827] ThymSrate synthase is an enzyme of about 30 to 35 Kd in most species except in protozoan and plants 
where it exists as a bifunctional enzyme that includes a dihydrofolate reductase domain. i„.„™o. 
MSM] A c^teL residue is involved in the catalytic mechanism (it covalently binds the 5.6-d.hydKHlUMP interme- 
diate) The sequence around the active site of this enzyme is ^'^^^^^^^^^^^^^^^ 1°,womtIi1fywi x FLVl FC is 
[laSi Coninsus pattemR-x(2)-[LIVM]-x(3)-[FW]-[QN]-x(8.9)-[LV]-x-P-C-[HAVM]-x(3)-[QMT]-lFYW]-x-[LV] [C is 

the active site residue] 

[ 1] Benkovic S J. Annu. Rev. Biochem. 49:227-251(1980). 

[ 2] Ross P., O'Gara R. Condon S. Appl. Environ. Microbiol. 56:2156-2163(1990). 

ities. classified into a single family: 

. Lysosomal alpha-glucosldase (EC 3.2.1.20) (acid maltase) is a vertebrate glycosidase active at low pH. which 

hydrolyzes alpha(1->4) and alpha(1->6) linkages in glycogen, maltose, and isomaltose. 
. Alpha-glucosidase (EC 3.2.1 .20) from the yeast Candida tsukunbaensis. 
. Alpha-'g.ucosidase(EC3.^^^^^^^^^^^ 

are homologous (41% of amino acid identity) and have most probably evolved by duplication. 
- Glucoamylase 1 (EC 3.2. 1 .3) (glucan 1 .4-alpha-glucosidase) from various fungal species. 

Yeast hypothetical protein YBR229C. 
. Fission yeast hypothetical protein SpAC30D11.01c. 

ri8321 An aspartic acid has been implicated [4] in the catalytic activity of sucrase. isomaltase, and ^^0^°";^' ^'Pj'^^ 
gluSidase The region around this active residue is highly conserved and can be used as a signature pattern. A 
secSid re Jon Which contains two conserved cysteines, has been used as an additiona signature pattern. 
[1833] Consensus pattern [GF]-lLIVMFl-W-x-D-M-[NSA]-E [D is the active srte residue] 

Lnsensus pattern G-lAVl-D-[LIVMTA]-C-G-lFY]-x(3)-lST]-x(3)-L-C-x-R-W-x(2)-lLVl-[GSA]-[SA]-F-x-P-F-x R [DN] 

r 11 Henrissat B. Biochem. J. 280:309-316(1991). ^^.,.„o..x 

2 K-^nsena B.T. Hogan S.. LarKin A.. Cantwe.1 B.A Eur. J. Bjochem. 202^57-6^(1991). 

3 Naim H.Y. Niermann T. Kleinhans U.. Hollenberg CP., Stras^r A-^ M/EBS l^tt. 2^J09 112^^^^^^ 

1 4] Hermans M.M.P., Kroos MA., van Beeumen J.. Oostra BA.. Reuser A.J.J. J. Biol. Chem. 266.13507 13512 
(1991). 

[1834] 778. Urease signatures 
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[1835] Urease (EC 3.5.1.5) is a nickel-binding enzyme that catalyzes the hydrolysis of urea to carbon dioxide and 
ammonia [1]. Historically, It was the first enzyme to be crystallized (In 1926). It is mainly found in plant seeds, micro- 
organisms and invertebrates. In plants, urease is a hexamer of identical chains. In bacteria [2], it consists of either two 
or three different subunits (alpha, beta and gamma). 
5 [1 836] Urease binds two nickel ions per subunit; four histidine, an aspartate and a carbamated-tysine serve as ligands 
to these metals; an additional histidine is involved in the catalytic mechanism [3]. 

[1837] As signatures for this enzyme, a region was selected that contains two histidine that bind one of the nickel 
ions and the region of the active site histidine. 

[1838] Consensus pattern T-[AY]-[GA]-[GATl-[LIVM]-D-x-H-[LIVMl-H-x(3)-P [The two H's bind nickel] 
10 Consensus pattern [LIVM](2)-[CT]-H-[HNJ-L-x(3)-[LIVM]-x(2)-D-[LIVM]-x-F-A [H is the active site residue] 

[ 1] Takishima K., Suga T.. Mamiya G. Eur. J. Biochem. 175:151-165(1988). 

[ 2] Mobley H.L.T., Husinger R.P. Microbiol. Rev. 53:85-108(1989). 

[ 3] Jabri E., Carr M.B., Hausinger R.R, Karplus RA. Science 268:998-1004(1995). 

IS 

[1839] 779. Tyrosine specific protein phosphatases signature and profiles 

[1840] Tyrosine specific protein phosphatases (EC 3. 1 .3.48) (PTPase) [1 to 5] are enzymes that catalyze the removal 
of a phosphate group attached to a tyrosine residue. These enzymes are very important in the control of cell growth, 
proliferation, differentiation and transformation. Multiple forms of PTPase have been characterized and can be classi- 
20 fled into two categories: soluble PTPases and transmembrane receptor proteins that contain PTPase domain(s). The 
currently known PTPases are listed below: 
[1841] Soluble PTPases. 

- PTPN1 (PTP-1B). 

25 - PTPN2 (T-cell PTPase; TC-PTP). 

PTPN3 (HI) and PTPN4 (MEG), enzymes that contain an N-terminal band 4.1- like domain (see <PDOC00566>) 
and could act at junctions between the membrane and cytoskeleton. 

- PTPN5 (STEP). 

- PTPN6 (PTP-1C; HCP; SHP) and PTPN11 (PTP-2C; SH-PTP3; Syp), enzymes which contain two copies of the 
30 SH2 domain at its N-terminal extremity. The Drosophila protein corkscrew (gene csw) also belongs to this subgroup. 

PTPN7 (LC-PTP; Hematopoietic protein-tyrosine phosphatase; HePTP). 

- PTPN8 (70Z-PEP). 

- PTPN9 (MEG2). 

- PTPN12 (PTP-G1; PTP-P19). 
35 - Yeast PTP1. 

Yeast PTP2 which may be involved in the ubiquitin-mediated protein degradation pathway. 
Fission yeast pypi and pyp2 which play a role in inhibiting the onset of mitosis. 
Fission yeast pyp3 which contributes to the dephosphorylation of cdc2. 
Yeast CDC 14 which may be involved in chromosome segregation. 
40 - Yersinia virulence plasmid PTPAses (gene yopH). 

Autographa califomica nuclear polyhedrosis virus 19 Kd PTPase. 

[1842] Dual specificity PTPases. 

45 . DUSP1 (PTPN10; MAP kinase phosphatase- 1 ; MKP-1); which dephosphorylates MAP kinase on both Thr-183 
and Tyr-185. 

DUSP2 (PAC-1). a nuclear enzyme that dephosphorylates MAP kinases ERK1 and ERK2 on both Thr and Tyr 
residues. 

- DUSP3 (VHR). 
50 - DUSP4 (HVH2). 

- DUSP5 (HVH3). 

- DUSP6(Pyst1; MKP-3). 

- DUSP7 (Pyst2; MKP-X). 

Yeast MSG5, a PTPase that dephosphorylates MAP kinase FUS3. 
55 - Yeast YVH1. 

Vaccinia virus H1 PTPase; a dual specificity phosphatase. 

[1 843] Receptor PTPases. 
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[1 8441 Structurally, ail known receptor PTPases, are made up of a variable length extracellular ^om^^M^^^^^^^^ ^ 
L r.!;L:rKr«na rLinn and a C-terminal catalytic cytoplasmic domain. Some of the receptor PTPases contain fi- 
bronectin'type"lll"(FN-lll) repeats, immunoglobulin-like domains, MAM domains or carbonic annyarase-..K« uu.nu.no 
rth^exS^ac^^^^ reg on The cytoplasmic region generally contains two copies of the PTPAse^.^"^^'';;^:j^['^ 
selms to hav^ activity, while the second is inactive but seems to affect substrate ^P^^^f^^"^^'^^^^^ 

thesTdo^^^^^^^ cysTeine is generally conserved but some other, presumably important, residues are not. 

[1845] in the following table, the domain structure of known receptor PTPases is shown: 
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n 8461 PTPase domains consist of about 300 amino acids. There are two conserved cysteines, the second one has 
been ihown^ ^^^^ required for activity. Furthermore, a number of conserved residues in its .mmed.ate vicinity 

nXf ^sTnaT^^^^^^ wasl'^ed for PTPase domains centered on the active site cysteine. 

IS ?herare t^^^^^^^^^ for PTPases, the first one spans the complete domain and is not specific to any 
subWoe The s^nd profile is specific to dual-specificity PTPases and the third one to the PTP subfamily. 

Snse1?us pattern [LlVMF]-H-C-x(2)-G-x(3H^^^^^^^ is the a^^^^^^ ^^SaSU that 

[1850] Notethe M-phase inducer phosphatases (cdc25.type phosphatase) are tyrosine-protem phosphatases that 

[;8;;;^ t^:^ a signature pattern and to profile. As profits are much more 

sensitive than the pattern, you should use them if you have access to the necessary software tools to do so. 

r 11 Fischer E.H.. Charbonneau H., Tonks N.K. Science 253:401-406(1991). 
2] Charbonneau H., Tonks N.K. Annu. Rev. Cell Biol. 8:463-493(1992). 
[ 3] Trowbridge I.S. J. Biol. Chem. 266:23517-23520(1991). 
[ 4] Tonks N.K.. Charbonneau H. Trends Biochem. Sci. 14:497-500(1989). 
[ 51 Hunter T Cell 58:1013-1016(1989). 

SISl Gap regions of the plasma membrane which consist of closely packed pa^ of 

anslmbr^^^^^^^^ the connexons, through which small molecules diffuse from a cell to ^ JJ-^^^f ^^^^^^^ 

Snnexon is composed of an hexamer of an integral membrane protein which is often referred to as ^onn^x'^J" a 
STsSec^^^^^^^ are a number of different, yet structurally related, tissue specific, forms of connexins. The types 
of connexins which are cunrently known are listed below. 

Connexin 56 (Cx56). 

- Connexin 50 (Cx50) (lens fiber protein MP70). 

- Connexin 46 (Cx46) (alpha-3). 

- Connexin 45 (Cx45) (alpha-6). 

- Connexin 43 (Cx43) (alpha-1 ). 

- Connexin 40 {Cx40) (alpha-5). 

- Connexin 38 (Cx38) (alpha-2). 
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Connexin 37 (Cx37) (alpha-4). 
Connexin 33 (Cx33) (alpha-7). 
Connexin 32 (Cx32) (beta-1). 
Connexin 31.1 (Cx31,1) (beta-4). 
Connexin 31 (Cx31) (beta-3). 
Connexin 30.3 (Cx30.3) (beta-5). 
Connexin 26 (Cx26) (beta-2). 

[1854] Stnjcturally the connexins consist of a short cytoplasmic N-terminal domain, followed by four transmembrane 
10 segments that delimit two extracellular and one cytoplasmic loops; the C-terminal domain is cytoplasmic and its length 
is variable (from 20 residues in Cx26 to 260 residues in Cx56). The schematic representation of this structure is shown 
below. 

IS NH2-*** *** *************-COOH 

*m Cytoplasmic 

20 

* * ** * * ** 

25 ** *• ♦* Membrane 

4c* *4> 
» * * 

30 ** ** ** ** Extracellular 

mm «* mm mm 
*m mm 

35 

[1855] The sequences of the two extracellular loops are well conserved. In both loops there are three conserved 
cysteines which are involved in disulfide bonds. A signature pattems from each of these two loop regions has been built. 
[1856] Consensus pattemC-[DN]-T-x-Q-P-G-C-x(2)-V-C-[FY]-D [The three C's are involved in disulfide bonds] Con- 
sensus patternC-x(3,4)-P-C-x(3)-[LIVM]-[DEN]-C-[FY]-[LIVMHSA]<[KR]-P [The three C's are involved in disulfide 
40 bonds] 

[1857] [ 1] Goodenough D.A.. Goliger J.A.. Paul D.L. Annu. Rev. Biochem. 65:475-502(1996). 
[1858] 781. Gram-positive cocci surface proteins 'anchoring' hexapeptide 

[1859] Surface proteins fi'om Gram-positive cocci contains a conserved hexapeptide located a few residues down- 
stream of a hydrophobic C-terminal membrane anchor region which is followed by a cluster of basic amino acids [1]. 
45 This structure is represented in the following schematic representation: 



-+-+- +-+ 



I Variable length extracellular domain |H| Anchor |Bl 
+ +-+ +.+ 



*H*: conserved hexapeptide. 
*B': cluster of basic residues. 
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[1860] It has been proposed that this hexapeptide sequence is responsible for a post^ranslational modification nec- 
ccc2ry for the proper anchoring of the proteins which bear it, to the cell wall. 
Proteins known to contain such hexapeptide are listed below: 

- Aggregation substance from streptococcus f aecalis (asa1 ). 

- C5a peptidase from Streptococcus pyogenes (scpA). 

- C protein alpha-antigen from Streptococcus agalactiae (bca). 

- Cell surface antigen l/ll (PAC) from Streptococcus mutans, 

- Dextranase from Streptococcus downei (dex). 

. Fibronectin-binding protein from Staphylococcus aureus (fnbA). 

- Fimbrial subunits from Actinomyces naeslundii and viscosus. 

- IgA binding protein from Streptococcus pyogenes (arp4). 

- IgA binding protein (B antigen) from Streptococcus agalactiae (bag). 

- igG binding proteins from Streptococci and Staphylococcus aureus. 
. Intemalin A from Listeria monocytogenes (inIA). 

M proteins from streptococci. 

- Muramidase-released protein from Streptococcus suis (mrp). 

- Nisin leader peptide processing protease from Lactococcus lactis (nisP). 

- Protein A from Staphylococcus aureus. 

- Trypsin-resistant surface T protein from streptococci. 

- Wall-associated protein from Streptococcus mutans (wapA). 

- Wall-associated serine proteinases from Lactococcus lactis. 

[1861] Consensus patternL-P-x-T-G-[STGAVDE] , • , .vo-.-.m '^'^17n990^ 

[18621 [ 1] Schneewind O.. Jones K.F., Fischetti V.A. J. Bactenol. 172.3310-3317(1990). 

[1863] 782. Gamma-glutamyltranspe^^ gamma^iutamyl moiety 

[ 11 Tate S.S., Meister A. Meth. Enzymol. 113:400-419(1985) ci72M989^ 
[2] Suzuki H.. Kumagai H.. Echigo T., Tochikura T. J. Bactenol. 17 :5169-5172(1989). 
[ 3] Ishiye M., Niwa M. Biochim. Biophys. Acta 1132:233-239(1992). 

a ~=^S=r~-^--=- r- 

r 11 Labbe-Bois R J. Biol. Chem. 265:7278-7283(1990). 
2 tenner D A Frasier F. Proc. Nat.. Acad. Sd. U.S.A. ^-^l^^;^^^-^.^^^ 
[ 3] Miyamoto K.. Nakahigashi K.. Nishimura K.. Inokuchi H. J. Md. Biol. 219.393-398(1991). 
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[1874] 784. Cellulose-binding domain, bacterial type 

[1 875] The microbial degradation of cellulose and xylans requires several types of enzyme such as endoglucanases 
(EC 3.2.1.4), cellobiohydrotases (EC 3.2.1.91) (exoglucanases). or xylanases (EC 3.2.1.8) [1]. 
[1876] Structurally, cellutases and xylanases generally consist of a catalytic domain joined to a cellulose-binding 
5 domain (CBD) by a short linker sequence rich in proline and/or hydroxy-amino acids. 

[1877] The CBD of a number of bacterial cellulases has been shown to consist of about 105 amino acid residues 
[2]. Enzymes known to contain such a domain are: 

Endoglucanase (gene endl) from Butyrivibrio fibrisolvens. 
10 - Endoglucanases A (gene cenA) and B (cenB) from Cellulomonas fimi. 
Exoglucanases A (gene cbhA) and B (cbhB) from Cellulomonas fimi. 
Endoglucanase E-2 (gene celB) from Thermomonospora fusca. 
Endoglucanase A (gene celA) from Microbispora bispora. 

Endoglucanases A (gene celA), B (celB) and C (ceIC) from Pseudomonas fluorescens. 
15 - Endoglucanase A (gene celA) from Streptomyces lividans. 
Exocellobiohydrolase (gene cex) from Cellulomonas fimi. 
Xylanases A (gene xynA) and B (xynB) from Pseudomonas fluorescens. 

Arabinofuranosidase C (EC 3.2.1 .55) (xylanase C) (gene xynC) from Pseudomonas fluorescens. 
Chitlnase 63 (EC 3.2.1.14) from Streptomyces plicatus. 
20 . Chitinase C from Streptomyces lividans. 



[1878] The CBD domain is found either at the N-terminal or at the C-terminal extremity of these enzymes. As it is 
shown in the following schematic representation, there are two conserved cysteines in this CBD domain - one at each 
extremity of the domain - which have been shown [3] to be involved in a disulfide bond. There are also four conserved 
25 tryptophan residues which could be involved in the interaction of the CBD with polysaccharides. 



+ + 

I I 

xCxxxxWxxxxxNxxxWxxxxxxxWxxxxxxxxWNxxxxxGxxxxxxxxxxCx 
35 *C*: conserved cysteine involved in a disulfide bond. position of the pattern. 



Consensus patternW-N-[STAGR]-[STDN]-[LIVM]-x(2)-[GST|-x-[GST|-x(2)- [LIVMFT|-[GA] 

40 

[1] Gilkes N.R.. Henrissat B., Kilbum D.G.. Miller R.C. Jr.. Wan-en R.A.J. Microbiol. Rev. 55:303-315(1991). 
[ 2] Meinke A.. Gilkes N.R.. Kilbum D.G.. Miller R.C. Jr., Wan^n R.A.J. Protein Seq. Data Anal. 4:349-353(1991). 
[ 3] Gilkes N.R., Claeyssens M.. Aebersold R., Henrissat B., Meinke A., Morrison H.D., Kilbum D.G., Warren R.A. 
J.. Miller R.C. Jr. Eur. J. Biochem. 202:367-377(1991). 

45 

[1879] 785. Amidases signature 

[1880] It has been shown [1 ,2,3] that several enzymes from various prokaryotic and eukaryotic organisms which are 
involved in the hydrolysis of amides (amidases) are evolutionary related. These enzymes are listed below. 

so - Indoleacetamide hydrolase (EC 3.5.1 .-). a bacterial plasmid-encoded enzyme that catalyzes the hydrolysis of in- 
dole-3-acetamide (lAM) into indole-3-acetate (lAA), the second step in the biosynthesis of auxins from tryptophan. 
Acetamidase from Emericella nidulans (gene amdS). an enzyme which allows acetamide to be used ds a sole 
carbon or nitrogen source. 

Amidase (EC 3.5.1.4) from Rhodococcus sp. N-774 and Brevibacterium sp. R312 (gene amdA). This enzyme 
55 hydrolyzes propionamides efficiently, and also at a lower efficiency, acetamide. acrylamide and indoleacetamide. 

Amidase (EC 3.5.1 .4) from Pseudomonas chlororaphis. 

6-aminohexanoate-cyclic-dimer hydrolase (EC 3.5.2.12) (nylon oligomers degrading enzyme El) (gene nylA), a 
bacterial plasmid encoded enzyme which catalyzes the first step in the degradation of 6-aminohexanoic acid cyclic 
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dimer, a by-product of nylon manufacture [4]. 
G!u!3!T!y!-tRNA(G!r>) amidntransferase subunit A ]5]. 

- Mammalian fatty acid amide hydrolase (gene FAAH) [6]. 
A putative amidase from yeast (gene AI\^D2). 

- Mycobacterium tuberculosis putative amidases amiA2, amiB2, amiC and amiD. 

[1 881] All these enzymes contain in their central section a highly conserved region rich in glydne, serine, and alanine 

residues. This region has been used as a signature pattern. ai c 

Snsensus pattern: G-[GA]-SKGS]-lGSl-G-x-[GSA]-[GSAVY]-x.[LIVM]-lGSA]-x(6)-[GSAT]-x-lGAl^^^ 

[LIVM]-R-x-P-[GSAC] 

r 11 Mavaux J -F., Cerbelaud E.. Soubrier P., Faucher D.. Petre D. J. Bacteriol. 172:6764-6773(1990). 
( 2] Hashimoto Y.. NIshlyama M.. Ikehata O.. Horinouchi S.. BeppuT. Biochim. Biophys. Acta 1088:225-233(1991). 
[ 3] Chang T.-H., Abelson J. Nucleic Acids Res. 18:7180-7180(1990). ^^a^,^oao^ 
4 Tsuch^a K., Fukuyama S., Kanzaki N.. Kanagawa K.. Negoro S.. Okada H. J. Bactenol. 71=3187-319 (1989). 
[ 51 Cumow A.W., Hong K.W., Yuan R., Kim S.I.. Martins O.. Winkler W., Henkin T.M., Soli D. Proc. Natl. Acad. 

Cra^^ Giang d2 MaS'eld S.P.. Boger D.L.. Lemer R.A., Gilula N.B. Nature 384:83-87(1996). 
ri8821 786. Glycosyl hydrolases family 10 active Site 

1 883] The microbial degradation of cellulose and xylans requires several types of en^mes such as ^"doglucanases 
EC 3 2 1 4), celloblohydrolases (EC 3.2.1.91) (exoglucanases). or xylanases (EC 3.2.1.8) [1.2]. Fung, and bacteria 
produces a spectrum of cellulolytic enzymes (cellulases) and xylanases which, on the basis 

can be classified into families. One of these families is known as the cellulase family F [3] or as the glycosyl hydrolases 
family 10 14.E1]. The enzymes which are cun-ently known to belong to this family are listed below. 

Aspergillus awamori xylanase A (xynA). 
Bacillus sp. strain 125 xylanase (xynA). 
Bacillus stearothermophllus xylanase. 

Butyrlvibrio fibrlsolvens xylanases A (xynA) and B (xynB). j„ 
Caldocellum saccharolyticum bifunctional endoglucanase/exoglucanase (celB). This protein conssts of two do- 
mains; it is the N-termlnal domain, which has exoglucanase activity, which belongs to this family. 
Caldocellum saccharolyticum xylanase A (xynA). „.„k<>ki« 
caldocellum saccharolyticum ORF4. This hypothetical protein is encoded in the xynABC operon and is probably 

a xylanase. 

Cellulomonas fimi exoglucanase/xylanase (cex). 
Clostridium stercorarium thermostable celloxylanase. 
Clostridium thermocellum xylanases Y (xynY) and Z (xynZ). 
Cryptococcus albidus xylanase. 
Penicillium chrysogenum xylanase (gene xylP). 
Pseudomonas fluorescens xylanases A (xynA) and B (xynB). 

Ruminococcus flavefaciens bifunctional xylanase XYLA (xynA). This protein consists of three domains: a N-ter- 
mina! xylanase catalytic domain that belongs to family 11 of glycosyl hydrolases: a central domain composed of 
^hort repeats of Gln.^n an Trp, and a Cterminal xylanase catalytic domain that belongs to family 1 0 of glycosyl 
hydrolases. 

Streptomyces lividans xylanase A (xInA). 

- Thermoanaerobacter saccharolyticum endoxylanase A (xynA). 
Thermoascus aurantiacus xylanase. 

- Thermophilic bacterium Rt8.B4 xylanase (xynA). 

[18841 One of the conserved regions in these enzymes is centered on a conserved glutamic acid residue which has 
been shown [5], in the exoglucanase from Cellulomonas fimi. to be directly involved in glycosidic bond cleavage by 
acting as a nucleophile. This region has been used as a signature pattem. 

[1885] Consensus pattem[GTA].x(2)-IUVNl-x-[IVMn-[STl-E-[LIY]-[DNl-[LIVMF] [E is the active site residue] 

[ 11 Beguin R Annu. Rev Microbiol. 44:219-248(1990). oh=/-.oo-ix 
[ 2] Gilkes N.R.. Henrissat B., Kilbum D.G., Miller R.C. Jr., Warren R.A.J. Microbiol. Rev^ 55:303-3 15(1 991). 
[ 3] Henrissat B.. Claeyssens M.. Tomme R. Lemesle L., Mornon J.-R Gene 81:83-95(1989). 
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[4] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 5] Tull D.. Withers S.G., Gilkes N.R., Kilbum D.G.. Warren R.A.J., Aebersold R. J. Biol. Chem. 266:15621-15625 
(1991). 

[1886] 787. Fructose-bisphosphate aldolase class-ll signatures 

[1 887] Fructose-bisphosphate aldolase (EC 4. 1 .2.1 3) [1 .2] is a glycolytic enzyme that catalyzes the reversible aldol 
cleavage or condensation of fructose- 1 ,6- bisphosphate into dihydroxyacetone-phosphate and glyceraldehyde 3-phos- 
phate. There are two classes of fructose-bisphosphate aldolases with different catalytic mechanisms. Class-ll aldolases 
[2], mainly found in prokaryotes and fungi, are homodimeric enzymes which require a divalent metal ion - generally 
zinc - for their activity. 

[1888] This family also includes the following proteins: 

Escherichia coli galactitol operon protein gatY which catalyzes the transformation of tagatose 1 ,6-bisphosphate 
into glycerone phosphate and D- glyceraldehyde 3-phosphate. 

Escherichia coli N-acetyl galactosamine operon protein agaY which catalyzes the same reaction as that of gatY. 

[1889] As signature pattems for this class of enzyme, two conserved regions were selected. The first pattern is 
located in the first half of the sequence and contains two histidine residues that have been shown [4] to be involved in 
binding a zinc ion. The second is located in the C-terminal section and contains clustered acidic residues and glycines. 
[1890] Consensus pattem[FYVMTl-x(1.3)-[LIVMH]-[APN]-[LIVM]-x(1,2)-[LlVMJ.H-x-D-H-[GACH] [The two H's are 
zinc Itgands] 

Consensus pattem[LlVM]-E-x-E-[LIVM]-G-x(2)-[GM]-[GSTA]-x-E 

[ 1] Perham R.N. Biochem. Soc. Trans. 18:185-187(1990). 

[ 2] Marsh J.J.. Lebherz H.G. Trends Biochem. Sci. 17:110-113(1992). 

[ 3] von der Osten C.H.. Barbas C.F. III. Wong C.-H., SInskey A.J. Mol. Microbiol. 3:1625-1637(1989). 
[4] Beny A.. Marshall K.E. FEBS Lett. 318:11-16(1993). 

[1891] 788. Prolyl oligopeptidase family serine active site 

[1 892] The prolyl oligopeptidase family [1 ,2,3] consist of a number of evolutionary related peptidases whose catalytic 
activity seems to be provided by a charge relay system similar to that of the trypsin family of serine proteases, but 
which evolved by independent convergent evolution. The known members of this family are listed below. 

Prolyl endopeptidase (EC 3.4.21 .26) (PE) (also called post-proltne cleaving enzyme). PE is an enzyme that cleaves 
peptide bonds on the C-terminal side of prolyl residues. The sequence of PE has been obtained from a mammalian 
species (pig) and from bacteria (Flavobacterium meningosepticum and Aeromonas hydrophila); there is a high 
degree of sequence conservation between these sequences. 

Escherichia coli protease II (EC 3.4.21.83) (oligopeptidase B) (gene prtB) which cleaves peptide bonds on the C- 
terminal side of lysyl and argininyl residues. 

Dipeptidyl peptidase IV (EC 3.4.14.5) (DPP IV). DPP IV is an enzyme that removes N-terminal dipeptides sequen- 
tially from polypeptides having unsubstituted N-termini provided that the penultimate residue is proline. 
Yeast vacuolar dipeptidyl aminopeptidase A (DPAP A) (gene: STE1 3) which is responsible for the proteolytic mat- 
uration of the alpha-factor precursor. 

Yeast vacuolar dipeptidyl aminopeptidase B (DPAP B) (gene: DAP2). 

Acylamino-acid-releasing enzyme (EC 3.4.19.1) (acyl-peptide hydrolase). This enzyme catalyzes the hydrolysis 
of the ami no-terminal peptide bond of an N-acetylated protein to generate a N-acetylated amino acid and a protein 
with a free amino-terminus. 

[1 893] A conserved serine residue has experimentally been shown (in E.coli protease II as well as in pig and bacterial 
PE) to be necessary for the catalytic mechanism. This serine, which is part of the catalytic triad (Ser, His, Asp), is 
generally located about 1 50 residues away from the C-terminal extremity of these enzymes (which are all proteins that 
contains about 700 to 800 amino acids). 

[1894] Consensus pattemD-x(3)-A-x(3)-[LIVMFYWl-x(14)-G-x-S-x-G-G-ILlVMFYW](2) [S is the active site residue] 
[1895] Note these proteins belong to families S9A/S9B/S9C in the classification of peptidases [4,E1]. 

[ 1] Rawlings N.D., Polgar L., Barrett A.J. Biochem. J. 279:907-911(1991). 
[2] Barrett A.J., Rawlings N.D. Biol. Chem. Hoppe-Seyler 373:353-360(1992). 
[ 3] Polgar L., Szabo E. Biol. Chem. Hoppe-Seyler 373:361-366(1992). 
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[4] Rawlings N.D.. Barrett A.J. Meth. Enzymol. 244:19-61(1994). 

mm in eukaryotes the FTHFS activity is expressed by a multifunctional enzyme. C^l-t^^rahydrofolate s^^^^^^^ 
CI THF svnS) Which also catalyzes the dehydrogenase and cyclohydrolase activities. Two forms of C1-THF 
ivLases Snoi^^^^^^^ is located in the mrtochondria. matrix, while the second one is cytoplasmic botMo-rn^ 
trFTHFsTom 1 consist of about 600 amino acid residues and is located in the C^en"'"^'/^f J^^^^;.^ 
tha^ in prokaryotes FTHFS activity Is expressed by a monofunctional homotetramenc enzyme of about 560 ammo 

M89r'**-SS Sequence of FTHFS is highly conserved in all forms of the enzyme. As signature patterns two regto-^ 
section of FTHFS. 

[1900] Consensus pattemG-[LIVM]-K-G-G-A-A-G-G-G-Y 
Consensus patternV-A-T-[IV]-R-A-L-K-x-[HN]-G-G 

1 1] Shannon K.W.. Rabinowitz J.C. J. Biol. Chem. 263:7717-7725(19^^^^^ 

[ 21 Lovell C.R.. Przybyla A.. Ljungdahl L.G. Biochemistry 29:5687-5694(1990). 

belong to this family: 

- Escherichia coli hypothetical protein yedX. 

- Bacillus subtilis hypothetical protein yunM. 

- Caenorhabditis elegans hypothetical protein R09H10.3. 

- Caenorhabditis elegans hypothetical protein ZK697.8. 

[1904] TWO regions were selected as signature patterns. The first located in the ^^"^'^^^^'^'^ ^^'^ " 

vsine known to be involved in binding T4. The second pattern is located .n the C-terrn.nal exfremrty. 

SSs] TTnLsus pattem[KHl-[lV]-L-lDN]-x(3)-G-x-P-A-x(2HIV]-x-[lV] Fhe K binds thyroxine] 

bonsinsus pattern Y-FH]-llVl-IAP]-x(2)-L-S-[PQ]-[FY^^^-[GS]-^^-lQSl 

[1906] 1 1] Schreiber G.. Richardson S.J. Comp. Biochem. Physiol. 1166.137-160(1997). 

the target of a variety of antimicrobial agenj such f^^^'^^^:; ^^^^^^^iion of 6-hydroxymethy.-7.8-dihy- 

inii OHPS is the C-terminal domain of a multifunctional folate synthesis enzyme (gene fas) 12]. 

[«11] ?wV?gnatrjLnerns for DHPS were developed, the first signature is located In the N4erminal section of 

these enzymes, while the second signature Is located in the central sechon. 

[1912] Consensus pattemILIVM]-x-[AGh[UVMF](2)-N-x-T-x-D-S-F-x-D-x-[SGl 
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Consensus pattemIGE]-[SA]-x-[LIVM](2)-D-[LIVM]-G-[GP]-x(2)-[STA]-x-P 

[ 1] Slock J., Stahly D.P.. Han C.-Y. Six E.W.. Crawford IP. J. Bacteriol. 172:7211-7226(1990). 

[ 2] Volpes R, Dyer M., Scaife J.G.. Darby G., Stammers D.K., Delves C.J. Gene 112:213-218(1992). 

5 

[1913] 792. Phosphattdylinositol 3- and 4-kinases signatures 

[1914] Phosphatidylinositol 3-kinase (PI3-kinase) (EC 2.7.1.137) [1] is an enzyme that phosphorylates phosphoi- 
nositides on the 3-hydroxyl group of the inositol ring. The exact function of the three products of PI3-kinase - PI-3-P, 
PI-3.4-P(2) and PI-3.4,5-P(3) - is not yet known, although it is proposed that they function as second messengers in 
10 cell signalling. Currently, three forms of PI3-kinase are known: 

The mammalian enzyme which is a heterodimer of a 110 Kd catalytic chain (pllO) and an 85 Kd subunit (p85) 
which allows it to bind to activated tyrosine protein kinases. There are at least two different types of pi 00 subunits 
(alpha and beta). 

15 - Yeast TOR1/DRR1 and TOR2/DRR2 [2], PI3-kinases required for cell cycle activation. Both are proteins of about 
280 Kd. 

Yeast VPS34 [3], a PI3-kinase involved In vacuolar sorting and segregation. VPS34 is a protein of about 100 Kd. 
Arabidopsis thaliana and soybean VPS34 homologs. 

20 [1915] Phosphatidylinositol 4-kinase (PI4-kinase) (EC 2.7.1.67) [4] is an enzyme that acts on phosphatidylinositol 
(PI) in the first committed step in the production of the second messenger inositol-1 ,4,5.-trisphosphate. Currently the 
following forms of PI4-kinases are known: 

Human PI4-klnase alpha. 
25 - Yeast PIK1, a nuclear protein of 120 Kd. 
Yeast STT4, a protein of 214 Kd. 

[1916] The PI3- and PI4-kinases share a well conserved domain at their C-terminal section; this domain seems to 
be distantly related to the catalytic domain of protein kinases [2]. Two signature patterns were developed from the best 
30 conserved parts of this domain. 

[1917] Four additional proteins belong to this family: 

Mammalian FKBP-rapamycin associated protein (FRAP) [5], which acts as the target for the cell-cycle arrest and 
immunosuppressive effects of the FKBP12-rapamycin complex. 
35 - Yeast protein ESR1 [6] which is required for cell growth, DNA repair and meiotic recombination. 
Yeast protein TEL1 which is involved In controlling telomere length. 
Yeast hypothetical protein YHR099w, a distantly related member of this family. 
Fission yeast hypothetical protein SpAC22E12.16C. 

40 [1918] Consensus pattemILIVMFAC]-K.x(1 ,3)-[DEA]-[DEl-[LIVMC]-R-Q-[DEJ-x(4)-Q 

Consensus pattem[GS]-x-[AV]-x(3)-[LIVM]-x(2)-[FYH]-[LIVM](2)-x-[LIVMF]-x-D-R-H-x(2)-N 

[ 1] Hiles I.D.. Otsu M., Volinia S., Fry M.J., Gout I., Dhand R., Panayotou G., Ruiz-Larrea F., Thompson A., Totty 
N.F., Hsuan J.J., Courtneidge S.A., Parker PJ., Waterfield M.D, Cell 70:419-429(1992). 
45 [ 2) Kunz J., Henriquez R.. Schneider U.. Deuter-Reinhard M.. Mowa N., Hall M.N. Cell 73:585-596(1993). 

[ 3] Schu PV., Takegawa K., Fry M.J.. Stack J.H., Waterfield M.D.. Emr S.D. Science 260:88-91(1993). 
[ 4] Garcia-Bustos J.F., Marini F., Stevenson I., Frei C, Hall M.N. EMBO J. 13:2352-2361(1994). 
[ 5] Brown E.J., Albers M.W.. Shin TB.. Ichikawa K., Keith C.T., Lane W.S., Schreiber S.L. Nature 369:756-758 
(1994). 

50 1 6] Kato R.. Ogawa H. Nucleic Acids Res. 22:3104-3112(1994). 

[1919] 793. FAD-dependent glycerol-3-phosphate dehydrogenase signatures 

[1 920] FAD-dependent gIycerol-3-phosphate dehydrogenase (EC 1 . 1 .99.5) (GPD) catalyzes the conversion of glyc- 
erol-3-phosphate into dihydroxyacetone phosphate. In bacteria [1] it is associated with the utilization of glycerol coupled 
55 to respiration. In Escherichia coli, two isozymes are known: one expressed under anaerobic conditions (gene glpA) 
and one in aerobic conditions (gene gIpD). In eukaryotes, a mitochondrial form of GPD participates in the glycerol 
phosphate shuttle in conjunction with an NAD-dependent cytoplasmic GPD (EC 1.1.1.8) [2,3]. 
[1921] These enzymes are proteins of about 60 to 70 Kd which contain a probable FAD-bindtng domain in their N- 
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terminal extremity. The mammalian enzyme differs from the bacterial or yeast proteins by having an EF-hand calcium- 

u;«^: or,ir.r,rQoo<pn0C00018>^ in its C-terminal extremity. . 

corresponds to a conserved region in the central part of these enz)fmes 
[1923] Consensus pattern[IV]-G-G-G-x{2)-G-[STACV]-G-x-A-x-D-x(3)-R-G 
Consensus pattemG-G-K-x(2)-[GSTE]-Y-R-x(2)-A 

1 1] Austin D.. Larson T.J. J. Bacteriol. 173:101-107(1991). 

r 21 Roennow B Kieiland-Brandt M.C. Yeast 9:1121-1130(1993). 

[3] ^ownTj mS^^^^ M.J., Lehn D.A.. Moran S.M. J. Biol. Chem. 269:14363-14366(1994). 

[1924] 794. NOL1/NOP2/sun family signature 

[1 9251 The following proteins seems to be evolutionary related: 

- Mammalian proliferating-cell nucleolar antigen p120 (gene NOL1) which may play a role in the regulation of the 
cell cvcle and the increased nucleolar activity that is associated with the cell proliferation. 

. YelsTnucTcJar p™t^^^ NOP2 (or YNA1 ) which could be involved in nucleolar function dunng the onset of growth, 
and in the maintenance of nucleolar structure. 

- Yeast hypothetical protein YBL024W. 
Bacterial protein sun (also known as fmu). 
Escherichia coli hypothetical protein yebU. 

- Mycobacterium tuberculosis hypothetical protein MtCY21 B4.24. 

- Methanococcus jannaschii hypothetical protein MJ0026. 

NOLI is a protein of 855 residues, NOP2 consists of 61 8 residues. YBL024w of 684. sun is a protein of about 4^^^^^ 
J50 reSdues^^^^^^ M J026 has 274 residues. They share a conserved central domain which contains some highly con- 
served regions. One of these regions was selected as a signature Pattem. 
[19261 Consensus pattern[FV]-D-tKRA]-[LIVMA]-L-x-D-[AVl-P-C-[STl-IGA] 

related. These proteins are: 

. Bacteria, and archebacterial protein moaA. which is involved in the biosynthesis of the molybdenum cofactor (mo- 

. t^S^ Sna cnx2. a protein involved in molybdopterin biosynthesis and which is highlys similar to moaA. 
Bacillus subtilisnarA. which seems to be the moaA ortholog in that bactena. 

Sfal pro l n^ (or f.xZ) which is involved in the biosynthesis of the nitrogenase iron-molybdenum cofactor. 
- laSe rial p otein pqqE which is involved in the biosynthesis of the cefaclor Py"«'r'"'"°""^ir™iS 
. Sr^oScus furiosus cmo, a protein involved in the synthesis of a molybdoptenn-based tungsten cofactor. 
. Caenortiabditis elegans hypothetical protein F49E2.1 . 

M9291 All these proteins share, in their N-terminal region, a conserved domain that <=o"*^'"^*;f«.'=yf'f"f„^;'" 
mS theseTstefnTs have been shown [1] to be important for the biological activ^. They could be .nolved .n the 

"crn^sCattt^t ^'^ '^"^'^^ ''^'^^ 

1 1] Menendez C. Igloi G.. Henninger H.. Brandsch R. Arch. Microbiol. 164:142-1|1(1995). 
1 i HoA, Schnon- K.M., Meyer C. Caboche M. J. Biol. Chem. 270:6100-6107(1995). 

riQ3ll 796 Forkhead-associated (FHA) domain profile ^ • „ ..^.i^^, «f 

iJS The forkhead^ssociated (FHA) domain [1.E1] is a putative nuclear signall.ng domain found .n a vanety erf 
oSse unrSJted proteins. The FHA domain comprise approximately 55 to 75 amino acids and conte.ns three h.ghly 
S.nsred b'Ss separated by dh^rgent spacer regions. Current^ it has been found .n the following proteins. 

. Four transcription factors that also contain a fori<head (FH) domain: mouse '^y^fl^f^^i;.^'^' V^^J^/Ji' 

transcription factor FHL1. which probably controls pre^RNA processing, and yeast FKH1 and FKH2. In those 
DfQtein the FHA domain is located N-terminal of the DNA-binding FH domain. 

- SS^2^iSated protein phosphatase (KAPP) from Arabidopsis thaliana. a protein which speafically interacts 
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with the receptor-type Ser/Thr-kinase RLK5. In KAPP, the FHA domain maps to a region that interacts with the 
receptor-type protein kinase RLK5 only if the kinase is phosphorylated on serine residues [2]. 
Two protein kinases from yeast that are involved in mediating the nuclear response to DNA damage: DUN1 and 
SPK1/SAD1 [3]. The latter is the only known protein containing two copies of the FHA domain. 
Protein kinase cds1 from fission yeast contains a FHA domain and might be the ortholog of SPK1. 
Protein kinase MFK1 from east, which is involved in meiotic recombination. 
Human nuclear antigen Ki67 which is expressed only in proliferating cells. 

Yeast hypothetical protein YHR115c, which contains a RING-finger C-terminal of the FHA domain. 
Yeast hypothetical proteins L8083.1 and 9346.10, which contain an extensive coiled-coil region C-terminal of the 
FHA domain. 

Caenorhabditis elegans hypothetical protein ZK632.2. 
Caenorhabditis elegans hypothetical protein C01G6.5. 

FraH from the prokaryote Anabaena. which contains a zinc-finger motif N-terminal of the FHA domain. 
An ORF from the bacterium Streptomyces, which is on the opposite strand of the protein kinase pks1, overiapptng 
the ORF of the kinase. 

[ 1] Hofmann K.O., Bucher P. Trends Biochem. Sci. 20:347-349(1995). 

[2] Stone J.M.. Collinge M.A., Smith R.D., Horn M.A., Walker J.C. Science 266:793-795(1994). 
[ 3] Navas T.A., Zhou Z.. Elledge S.J. Cell 80:29-39(1995). 

20 

[1 933] 797. Ald_Xan_dh_C 

Aldehyde oxidase and xanthine dehydrogenase, C terminus 

[1934] [1] Romao MJ. Archer M, Moura I, Moura JJ, LeGall J. Engh R, Schneider M, Hof P, Huber R; Medline: 
96072968 "Crystal structure of the xanthine oxidase-related aldehyde oxido-reductase from D. gigas." Science 1995; 
25 270:1170-1176. 

Number of members: 54 

[1935] 798. Glyco_hydro_38 
30 Glycosyl hydrolases family 38 

[1936] Glycosyl hydrolases are key enzymes of carbohydrate metabolism. 

Number of members: 20 

35 [1937] [1] Henrissat B; Medline: 98313424; Glycosidase families** Biochem Soc Trans 1998;26:153-156. 
[1938] 799. HECT 
HECT-domain (ubiquitin-transferase). 

[1939] The name HECT comes from Homologous to the E6-AP Carboxyl Terminus. 

^ Number of members: 43 

[1940] [1] Huibregtse JM. SchefFner M, Beaudenon S, Howley PM; Medline: 95223981; A family of proteins struc- 
turally and functionally related to the E6-AP ubiquitin-protein ligase." Proc Natl Acad Sci U S A 1995;92:2563-2567. 
[1941] 800. HRDC 
45 HRDC domain 

[1942] The HRDC (Helicase and RNase D C-terminal) domain has a putative role in nucleic acid binding. Mutations 
in the HRDC domain cause human disease. 

Number of members: 19 

50 

[1 943] [1 ] Morozov V, Mushegian AR» Koonin EV. Bork P; Medline: 98060076; A putative nucleic acid-binding domain 
in Bloom's and Werner's syndrome heiicases** Trends Biochem Sci 1997;22:417-418. 
[1944] 801. Integrase 

[1945] Integrase mediates integration of a DNA copy of the viral genome into the host chromosome. Integrase is 
55 composed of three domains. The amino-terminal domain is a zinc binding domain. The central domain is the catalytic 
domain [1].The carboxyl terminal domain is a DNA binding domain [2]. 
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Number of members: 581 
[1946] 



,11 Dyda F Hickman AB. Jer^kms TM. Enge.mar, A. Craigie R. Davies DR; ^fdline: 95099322 Cry^alstrurtu^^ 
of the iitelytic domain of HIV-1 integrase: similarity to other polynucleotidyl transferases. Science 1994,266. 

p't^lf PJ. Ernst JA. Kuszewski J. Hickman AB, Engelman A, Craigie R, Clore GM. ^ronenbom AM^^^^^^^^^ 
?5359147: Solution structure of the DNA binding domain of HIV-1 integrase." Biochemistry 1995.34.9826-9833 

[1947] 802. Iig_chan 

ST'TSirmSmTuLthefourtransmemb.^^ 
tors. 

Number of members: 128 

[1949] [1] Tong G, Shepherd D. Jahr CE; Medline: 95184014; Synaptic desensitization of NMDA receptors by cal- 
cineurin." Science 1995;267:1510-1512. 
[1950] 803. RhoGAP 
RhoGAP domain 

[1951] GTPase activator proteins towards Rho/Rac/Cdc42-like small GTPases. 

Number of members: 97 

[1952] 



[1] Musacchio A, Cantley LG. Hamson SO; Medline: 97121392; ^^V^tal «re of the^ea^^^^^ JTlsgeS 
homology domain from phosphoinositide 3-kinase p85 alpha subunrt." Pioc Natl Acad Sci U S A 1996.93. 

n?lln^n7x\ao B Dodson EJ Dodson G. Ludbrook SB. Numiahomed K. Gamblin SJ. Musacchio A. Smerdon 
s], E^tL JF; MedSe 97162209; The structure of the GTPase^c«vatlng domain from p50rt,oGAP." Nature 

iTStTwalker PA. Eccleston JF. Nurmahomed K. Owen D, Laue E, Gamblin S''^ ^rnerdon^^^^^ 
97404320; Crystal structure of a small G protein In complex with the GTPase^ct.vat.ng protein rhoGAR Nature 

;ZSusk?Ms','McCormlck F; Medline: 94081948; Proteins regulating Ras and Its relatives." Nature 1993;366: 



643-654. 



[1953] 804. vwd 

T.TT^%rl^.n^'^SrS2e-, The modular architecture of a new family of growth ..gulators related to 
connective tissue growth factor." FEBS lett 1993;327:125-130. 

Number of members: 92 

[1955] 805. zf-C4_Topoisom 
Topoisomerase DNA binding C4 zinc finger 

ril Tse-Dlnh YC Beran-Steed RK; Medline: 89034032; Escherichia coli DNA topoisomerase I Is a zinc metallo- 

of a high-affinity DNA binding domain." Biochem Blophys Res Commun 1998;251. 509-514. 

Number of members: 51 

[1956] 806. AIRC 
AIR carboxylase 
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Members of this family catalyse the decarboxylation of 1-(5-phosphoribosyl)-5-amino-4-imida20le-carboxylate (AIR). 
This family catalyse the sixth step of de novo purine biosynthesis. Some members of this family contain two copies of 
this domain. Number of members: 35 
[1957] 807. Bromodomain signature and profile 
5 PROSITE cross-reference(s): PS00633; BROMODOMAIN^I . PS50014: 
BROMODOMAIN_2 

The bromodomain [1,2,3] is a conserved region of about 70 amino acids found in the following proteins: 

Higher eukaryotes transcription initiation factor TFilD 250 Kd subunit (TBP-associated factor p250) (gene CCG1 ). 
10 P250 associated with the TFIID TATA-box binding protein and seems essential for progression of the G1 phase 

of the ceil cycle. 

Human RING3, a protein of unknown function encoded in the MHC class II locus. 

Mammalian CREB^binding protein (CBP), which mediates cAMP-gene regulation by binding specifically to phos- 
phorylated CREB protein. 

15 - Drosophila female sterile homeotic protein (gene fsh). required maternally for proper expression of other homeotic 
genes involved in pattern formation, such as Ubx. 

Drosophila brahma protein (gene bnm), a protein required for the activation of multiple homeotic genes. 
Mammalian homologs of brahma. In human, three brahma-iike proteins are known: SNF2a(hBRM), SNF2b. and 
BRG1. 

20 . Human BS69, a protein that binds to adenovirus El A and inhibits El A transactivation - Human peregrin (or Br140). 

Yeast BDF1 [3], a transcription factor involved in the expression of a broad class of genes including snRNAs. 

Yeast GCN5, a general transcriptional activator operating in concert with certain other DNA-binding transcriptional 

activators, such as GCN4, HAP2/3/4 or ADA2. 

Yeast NPS1/STH1. involved in G(2) phase control in mitosis. 
25 - Yeast SNF2/SWI2, which is part of a complex with the SNF5. SNF6. SWI3 and ADR6/SW11 proteins. This SWI- 

complex is involved in transcriptional activation. 

Yeast SPT7. a transcriptional activator of Ty elements and possibly other genes. 
Caenorhabditis elegans protein cbp-1 . 
Yeast hypothetical protein YGR056w. 
30 - Yeast hypothetical protein YKROOSw. 
Yeast hypothetical protein L9638. 1 . 

[1958] Some proteins contain a region which, while similar to some extent to a classical bromodomain, diverges from 
it by either lacking part of the domain or because of an insertion. These proteins are: 

35 

Mammalian protein HRX (also known as AII-1 or MLL), a protein involved in translocations leading to acute leuke- 
mias and which possibly acts as a transcriptional regulatory factor. HRX contains a region similar to the C- temninal 
half of the bromodomain. 

Caenorhabditis eiegans hypothetical protein ZK783.4. The bromodomain of this protein has a 23 amino-acid in- 
40 sertion. 

Yeast protein YTA7. This protein contains a region with significant similarity to the C-terminal half of the bromodo- 
main. As it is a member of the AAA family (see <POOC00572>) it is also in a functionally different context. 

[1 959] The above proteins generally contain a single bromodomain, but some of them contain two copies, this is the 
« case of BDF1, CCG1, fsh, RING3. YKROOSw and L9638.1. 

[1 960] The exact function of this domain is not yet known but it is thought to be involved in protein-protein interactions 

and it may be important for the assembly or activity of multicomponent complexes involved in transcriptional activation. 

[1961] The consensus pattern that has been developed spans a major part of the bromodomain: a more sensitive 

detection is available through the use of a profile which spans the whole domain. 
50 Consensus pattem[STANVF]-x(2)-F-x(4)-[DNSJ-x{5.7)-[DENQTF]-Y-[HFYJ-x(2)-[LIVMFY]-x(3)-[LIVM]-x(4)-[LIVM]-x 

(6.8)-Y-x( 1 2. 1 3)-[LI VM]-x(2)-N-[SACF]-x(2)-[FY] 

References 
55 [1962] 

[ 1] Haynes S.R.. Doolard C, Winston F.. Beck S., Trowsdale J.. Dawid I.B. Nucleic Acids Res. 20:2693-2603 
(1992). 
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[2] Tamkun J.W.. Deuring R., Scott MP.. Kissinger M.. Pattatucci A.M.. Kaufman T.C.. Kennison J.A. Ceil 68: 

561-572(1992) 

f 3] Tamkun J.W. Curr. Opin. Genet. Dev. 5:473-477(1995). 
ri9631 808. (CH) Actinin-type actin-binding domain signatures 

the cytoskeleton to the plasma membrane. 
■ 1" "TTfoSrAVraX fiT^^^^^^^ protein that link actin filaments to membrane glycoproteins. 

be essential for the binding of actin. 

[LIVMHLiVI«ITl-W-x- ILIVM](2) 

[ 11 Schleicher M.. Andre E., Harmann A.. Noegel A.A. Dev. Genet. 9:521-530(1988). 
[ 2) Matsudaira P. Trends Biochem. Sci. 16:87-92(1991). 
[ 3] Dubreui! R.R. BioEssays 13:219-226(1991). 

[1967] 809 (COX1) Heme-copper oxidase subunit I. copper B binding region signature PROSITE cross-reference 
i.rS^;SJp"ato.oxidasesmareon^^^^^^^^^^^ 

step in the respiratory chain: they transfer electrons from or the mitochondrial inner 

generate a transmembrane proton gradient across the P'^^'^^ "^^^^^^^^^^^^^^^^^^ 

llgSr. oTSS ™wc Sd. »duaas. (NOR) (n d„«,it,1.9 spiles of A,«.» Ec«.o»™. 
[1970] Enzymes that belong to this family are: 

has been found in other purple bactena. „*dnces nitrate to dinitrogen. It is a het- 

. Nitric oxide reductase (EC 1.7.99.7) from Pseudomonas stutzen. NOR reduces nitrate to ainiiroge 
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erodimer of norC and the catalytic subunit norB. The latter contains the 6 invariant histidine residues and 12 trans- 
membrane segments [5]. 

[1971] As a signature pattern the copper-binding region was used. 
5 [1972] Consensus pattemIYWG]-[LIVFYWTA](2)-[VGS]-H-[LNPl-x-V-x(44,47)-H-H [The three H's are copper B lig- 

ands] 

[1973] Notecytochrome bd complexes do not belong to this family. 
[1] 

10 Garcia-Horsman J.A., Barquera B., Rumbley J., Ma J., Gennis R.B. J. Bacterid. 176:5587-5600(1994). 

[2] 

Castresana J.. Luebben M.. Saraste M.. Higgtns D.G. EMBO J. 13:2516-2525(1994). 
[3] 

Capaldi R.A., Malatesta F.. Dariey-Usmar V.M. 
15 Biochim. Biophys. Acta 726:135-148(1983). 

[4] 

Holm L., Saraste M., Wikstrom M. 
EMBO J. 6:2819-2823(1987). 
[5] 

20 Saraste M., Castresana J. 

FEBS Lett. 341:1-4(1994). 

[1974] 810. (dehydrog_molyb) Eukaryotic molybdopterin oxidoreductases signature PROSITE cross-reference(s): 
PS00559; MOLYBDOPTERIN_EUK 
25 [1975] A number of different eukaryotic oxidoreductases that require and bind a molybdopterin cofactor have been 
shown [1] to share a few regions of sequence similarity. These enzymes are: 

Xanthine dehydrogenase (EC 1.1.1 .204). which catalyzes the oxidation of xanthine to uric acid with the concomitant 
reduction of NAD. Structurally, this enzyme of about 1300 amino acids consists of at least three distinct domains: 
30 an N-terminal 2Fe-2S ferredoxin-like iron-sulfur binding domain (see <PDOC00175>), a central FAD/NAD-binding 

domain and a C-terminal Mo-pterin domain. 

Aldehyde oxidase (EC 1.2.3.1), which catalyzes the oxidation aldehydes into acids. Aldehyde oxidase is highly 
similar to xanthine dehydrogenase in its sequence and domain structure. 

Nitrate reductase (EC 1 .6.6.1), which catalyzes the reduction of nitrate to nitrite. Structurally, this enzyme of about 
35 900 amino acids consists of an N-terminal Mo-pterin domain, a central cytochrome b5-type heme-binding domain 

(see <PDOC00170>) and a C-terminal FAD/NAD-binding cytochrome reductase domain. 

Sulfite oxidase (EC 1 .8.3.1). which catalyzes the oxidation of sulfite to sulfate. Structurally, this enzyme of about 
460 amino acids consists of an N-terminal cytochrome b5-binding domain followed by a Mo-pterin domain. 

40 [1 976] There are a few conserved regions in the sequence of the molybdopterin-binding domain of these enzymes. 
The pattern uses to detect these proteins is based on one of them. It contains a cysteine residue which could be 
involved in binding the molybdopterin cofactor. 

[1977] Consensus pattem[GA]-x(3)-[KRNQHTl-x(11.14)-[LIVMFYWS]-x(8)-[LIVMF]-x-C-x(2)-[DEN]-R-x(2)-[DE] 
Ml 

45 Wootton J.C., Nicolson R.E.. Cock J.M.. Walters D.E.. Burke J.R. Doyle 
W.A., Bray R.C. 

Biochim. Biophys. Acta 1057:157-185(1991). 

811. (DNA.Iigase) ATP-dependent DNA ligase signatures 

PROSITE cross-reference(s): PS00697; DNA„LIGASE_A1 . PS00333; DNA_LIGASE_A2 
50 [1978] DNA ligase (polydeoxyribonucleotide synthase) is the enzyme that joins two DNA fragments by catalyzing 
the formation of an intemucleotide ester bond between phosphate and deoxyribose. It is active during DNA replication, 
DNA repair and DNA recombination. There are two forms of DNA ligase: one requires ATP (EC 6.5.1 .1), the other NAD 

(EC 6.5.1.2). 

[1979] Eukaryotic, archaebacterial, virus and phage DNA ligases are ATP-dependent. During the first step of the 
55 joining reaction, the ligase interacts with ATP to form a covalent enzyme-adenylate intermediate. A conserved lysine 
residue is the site of adenylation [1 ,2]. 

[1980] Apart from the active site region, the only conserved region common to all ATP-dependent DNA ligases is 
found [3] in the C-tenninat section and contains a conserved glutamate as well as four positions with conserved basic 
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residues. 

r-ioon cir^nstiir«i na«ern5s wftre develooed for both conserved regions. 

liswl Consensurpattem[E[X3H]-x-K-x-pN]-G-x-R-[GACIVM^ [K is the ^^^f^^^/f^^^'^l „ „ 
1983 Consensus pattemE-G-[LlVMA]-[LIVMl(2HKR]-x(5.8)-{YVV]-lQNEK]-x(2.6HKRH]-x{3 5^^^^ 
Sequences known to belong to this class detected by the pattemALL. except for archebactenal DNA hgases. 

[1] 

Tomklnson A.E., Totty N.F., GInsburg M., Lindahl T. 
Proc. Natl. Acad. Sci. U.S.A. 88:400^04(1991). 
[21 

Lindahl T., Barnes D.E. 

Annu. Rev. Biochem. 61:251-281(1992). 

13] 

Kletzin A. 

Nucleic Acids Res. 20:5389-5396(1992). 

[1984] 812. (FAD_Gly3P_dh) FAD-dependent glycerol-3-phosphate dehydrogenase signatures PROSITE cross-ref- 
erence(s)- PS00977; FAD G3PDH_1, PS00978; FAD_G3PDH_2 

Sssi >AD-dependentglycerol-3-phosphate dehydrogenase (EC 1.1.99.5) (GPD) catalyzes the conversion of glyc- 
err3phosphateintodihydUace.onephosphate.lnbacterial1]itisassociatedwitht^^ 

to resiation In Escherichia coli. tv.o isozymes are known: one expressed under anaerobic conditions gene glpA 
and rneTaerobic conditions (gene gIpD). In eukaryotes. a mitochondrial form of GPD participates in the glycerol 
phosphate shuttle in conjunction with an NAD-dependent cytoplasmic GPD (^CJ-I-I-S) [2. 3h 
[1986] These enzymes are proteins of about 60 to 70 Kd which contain a probable FAD-binding *>ma^" J" 
temiinal extremity. The mammalian enzyme differs from the bacterial or yeast proteins by having an EF-hand calcum- 
bindinoreaion (See <PDOC00018>) in its C-temiinal extremity. 

[S?? TSgnature patterns were developed. One based on the first half of the FAD-^^ 

corresponds to a conserved region in the central part of these enzymes. 
[1988] Consensus pattem[IV]-G-G-G-x(2)-G-[STACV]-G-x-A-x-D-x(3)-R-G 
Consensus pattemG-G-K-x(2)-[6STE]-Y-R-x(2)-A 

[1] 

Austin D., Larson T.J. 

J. Bacteriol. 173:101-107(1991). 

[2] 

Roennow B., Kielland-Brandt M.C. 

Yeast 9:1121-1130(1993). 

[3] 

Brown L.J., McDonald M.J., Lehn D.A., Moran S.M. 
J. Biol. Chem. 269:14363-14366(1994). 

[1989] 813. (Fapy_DNA_glyco) Formamidopyrimidine-DNA glycosylase signature PROSITE cross-reference{s): 

MMir^Fomamidopyrimidine-DNA glycosylase (EC 3.2.2.23) [1] (Fapy-DNA glycosylase) (gene fpg) 

erSrne involved in DNA repair and which excise oxidized purine bases to release 2.6-d.amino;4-hydroxy-5N-methyl- 

fo3midopyrimidlne (Fapy) and 7,8-dihydro-8-oxoguanine (8-OxoG) residues. In .^^^^^^^ 

FPG can also nick DNA at apurinic/apyrimidinic sites (AP sites). FPG is a monomenc protein of about 32 Kd which 

binds and require zinc for its activity. * _ <,r.H 

[1991] The binding site for zinc seems to be located in the C-terminal part of the enzyme where fours conserved and 

essential [2] cysteines are located. A signature pattem was developed based on th^ ^egion^ 

[1992] Consensus pattemC-x{2.4)-C-x-[GTAQ]-x-[IV]-x(7)-R-[GSTAN]-[STA]-x-[FYI]-C- x(2)-C-Q 

[The four C's are putative zinc ligands] 

Duwat P., de Oliveira R., Ehriich S.D., Boiteux S. 

Microbiology 141:411-417(1995). 

[2] 

O'Connor T.E., Graves R.J.. Demurcia G.. Castaing B.. Laval J. 
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J. Biol. Chem. 268:9063-9070(1993). 

[1993] 814. (G_glu_transpept) Gamma-glutamyltranspeptidase signature PROSITE cross-reference(s): PS00462; 
G_GLU_TRANSPEPTIDASE 

5 [1994] Gamma-glutamyltranspeptidase (EC 2.3.2.2) (GGT) [1] catalyzes the transfer of the gamma-glutamyl moiety 
of glutathione to an acceptor that may be an amino acid, a peptide or water (forming glutamate). GGT plays a key role 
in the gamma-glutamyl cycle, a pathway for the synthesis and degradation of glutathione. In prokaryotes and eukary- 
otes, it is an enzyme that consists of two polypeptide chains, a heavy and a light subunit, processed from a single 
chain precursor. The active site of GGT is known to be located in the light subunit. 

10 [1995] The sequences of mammalian and bacterial GGT show a number of regions of high similarity [2). Pseu- 
domonas cephalosporin acylases (EC 3.5.1.-) that convert 7-beta-(4-cart)oxybutanamido)-cephaIosporanic acid (GL- 
7ACA) into 7-aminocephaIosporanic acid (7ACA) and glutaric acid are evolutionary related to GGT and also show 
some GGT activity [3]. Like GGT, these GL-7ACA acylases. are also composed of two subunits. 
[1996] One of the conserved regions correspond to the N-terminal extremity of the mature light chains of these 

15 enzymes. This region was used as a signature pattern. 

[1997] Consensus patternT-[STA]-H-x-[ST|-[LIVMA]-x(4)-G-[SN]-x-V-[STA]-x-T-x-T-[LIVM]-[NE]-x(1.2)-[FY]-G 

[11 

Tate S.S.. Meister A. 
20 Meth. Enzymol. 113:400-419(1985). 

[2] 

Suzuki H., Kumagai H., Echigo T, Tochikura T. 

J. Bacteriol. 171:5169-5172(1989). 

[3] 

25 Ishiye M., Niwa M. 

Biochim. Biophys. Acta 1132:233-239(1992). 

[1998] 815. G-protetn gamma subunit profile 

PROSITE cross-reference(s): PS50058: G_PROTEIN_GAMMA 

30 [1999] Guanine nucleotide-binding proteins (G proteins) [1] act as intermediaries in the transduction of signals gen- 
erated by transmembrane receptors. G proteins consist of three subunits (alpha, beta, and gamma). The alpha subunit 
binds to and hydrolyzes GTP; the functions of the beta and gamma subunits are less clear but they seem to be required 
for the replacement of GDP by GTP as well as for membrane anchoring and receptor recognition. 
[2000] The gamma subunits are small proteins (from 70 to 110 residues) that are bound to the membrane via a 

35 isoprenyl group (either a famesyl or a geranylgeranyl) covalently linked to their C-terminus. In mammals there are at 
least 12 different isoforms of gamma subunits. 

[2001] The Caenorhabditis elegans protein egl-10, which is a regulator of G-protein signalling, contains a G-protein 
gamma-like domain. 

[2002] A profile was developed that spans the complete length of the gamma subunit. 

40 [1] 

Pennington S.R. 
Protein Prof. 2:16-315(1995). 
[2003] 816. GNS1/SUR4 family signature 
PROSITE cross-reference(s): PS01188; GNS1_SUR4 
45 [2004] The following group of eukaryotic integral membrane proteins, whose exact function has not yet clearly been 
established, are evolutionary related [1]: 

Yeast GNS1 [2], a protein involved in synthesis of 1 ,3-beta-glucan. 

Yeast SUR4 (or APA1 , SRE1 ) [3], a protein that could act in a glucose-signaling pathway that controls the expres- 
50 sion of several genes that are transcriptionally regulated by glucose. 

Yeast hypothetical protein YJL196C. 
Caenorhabditis elegans hypothetical protein C40H1.4. 
Caenorhabditis elegans hypothetical protein D2024.3. 

55 [2005] The proteins have from 290 to 435 amino acid residues. Structurally, they seem to be formed of three sections: 
a N-terminal region with two transmembrane domains, a central hydrophilic loop and a C-terminal region that contains 
from one to three transmembrane domains. A conserved region that contains three histidines was selected as a sig- 
nature pattern. This region is located in the hydrophilic loop. 
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Consensus pattemL-x-F-L-H-x-Y-H-H 
[1] 

Bairoch A. 

Unpublished observations (1996). 
[2] 

El-Sherbeini M., Clemas J. A. 

J. Bacteriol. 177:3227-3234(1995). 

[3] 

Garcia-Arranz M.. Maldonado A.M.. Mazon M.J.. Portillo F. 
J. Biol. Cham. 269:18076-18082(1994). 

[2006] 817. Immunoglobulins and major histocompatibility complex proteins signature PROSITE cross-reference(s): 

r2007f °Tl2l)a^?^^^ of immunoglobulin (Ig) [1] molecules is a tetramer of two light chains and two heavy chains 
linked by disulfide bonds. There are two types of light chains: kappa and lambda, each composed of a constant domain 
(CD and a variable domain (VL). There are five types of heavy chains: alpha, delta, epsilon. gamma and mu, all 
consisting of a variable domain (VH) and three (in alpha, delta and gamma) or four (in epsilon and mu) constant domains 

[2o"o8r Tht major histocompatibility complex (MHC) molecules are made of two chains. In class I [2] the alpha chain 
s composed of three extracellular domains, a transmembrane region and a cytoplasmic tail. Tj^f^t>ete chain (beta- 
2-microglobulin) is composed of a single extracellular domain. In class II [3], both the alpha and the beta chains are 
composed of two extracellular domains, a transmembrane region and a cytoplasmic tail. 

[20091 It is known [4.5] that the Ig constant chain domains and a single extracellular domain m each type of MHC 
chains are related. These homologous domains are approximately one hundred amino acids long and '"^'^^e a con- 
served intradomain disulfide bond. A small pattem around the C-terminal cysteine is involved in this disulfide bond 
which can be used to detect these category of Ig related proteins. , ^ *^ n^rr.- ir, 

[2010] Consensus pattern[FY].x.C-x-[VA]-x-H-Sequences known to belong to this class detected the Pattern^ Ig 
heavy chains type Alpha C region : All. in CH2 and CH3. Ig heavy chains type Delta C region : A", -n CHJ^ 19 heavy 
chains type Epsilon C region: All. in CHI. CH3 and CH4. Ig heavy chains type Gamma C region All. in CH3 and also 
CHI in some cases Ig heavy chains type Mu C region : All, in CH2, CH3 and CH4. Ig light cha.ns type K^^^^^ C reg on^ 
in all CL except rabbU and Xenopus. Ig light chains type Lambda C region : In all CL except rabbit. MHC class I alpha 

in^alpha-3 domains, including in the cytomegalovirus MHC-1 homologous protein [6]. Beta-2-microglobulin : All. 
MHC class II alpha chains: All. in alpha-2 domains. MHC class II beta chains: All. in beta-2 domains. 



55 



Ml 

Gough N. 

Trends Biochem. Sci. 6:203-205(1981). 
40 [21 

Klein J.. Figueroa F. 

Immunol. Today 7:41-44(1986). 

[3] 

Figueroa F., Klein J. 
45 Immunol. Today 7:78-81(1986). 

[4] 

On- H.T.. Lancet D.. Robb R.J.. Lopez de Castro J.A.. Strominger J.L. 

Nature 282 266-270(1979). 

[5] 

50 Cushley W., Owen M.J. 

Immunol. Today 4:88-92(1983). 
16] 

Beck S.. Barrel B.G. 
Nature 331:269-272(1988). 



[2011] 818. (IGFBP) insulin-like growth factor binding proteins signature PROSITE cross-reference(s): PS00222; 
pofzl'^The^nsulin growth factors (IGF-l and IGF-II) bind to specific binding proteins in extracellular fluids with 
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high affinity [1.2,3]. These IGF-binding proteins (IGFBP) prolong the half-life of the IGFs and have been shown to either 
inhibit or stimulate the growth promoting effects of the IGFs on cells culture. They seem to alter the interaction of IGFs 
with their cell surface receptors. There are at least six different IGFBPs and they are structurally related. 
[201 3] The following growth-factor inducible proteins are structurally related to IG FBPs and could function as growth- 
5 factor binding proteins [4.5]: 

Mouse protein cyr61 and its probable chicken homolog. protein CEF-10. 

Human connective tissue growth factor (CTGF) and its mouse homolog. protein FISP-12. 

Vertebrate protein NOV. 

10 

[2014] As a signature pattern a conserved cysteine-rich region locatedin the N-terminal section of these proteins is 
used. 

(201 5] Consensus pattemG-C-[GS]-C-C-x(2)-C-A-x{6)-C 

Sequences known to belong to this class detected by the patternALL, except for IGFBP-6's. 

15 

[1] 

Rechler M.M. 

Vitam. Horm. 47:1-114(1993). 
[2] 

20 Shtmasaki S.. Ling N. 

Prog. Growth Factor Res. 3:243-266(1991). 
[3] 

Clemmons D.R. 

Trends Endocrinol. Metab. 1:412-417(1990). 
25 [4] 

Bradham D.M., Igarashi A., Potter R.L., Grotendorst G.R. 

J. Cell Biol. 114:1285-1294(1991). 

[5] 

Maloisel V., Martinerie C, Dambrine G., Plassiart G., Brisac M., Crochet 
30 J., Perbal B. 

Mol. Cell. Biol. 12:10-21(1992). 



[2016] 819. LMWPc: Low molecular weight phosphotyrosine protein phosphatase 
35 Number of members: 34 

[2017] [1]Medline: 943291 82. The crystal structure of a low-molecular-wetght phosphotyrosine protein phosphatase. 

Su XD, Taddei N, Stefani M, Ramponi G, Nordlund P; Nature 1994;370:575-578. 

[2018] 820. (myosin_head) ATP/GTP-binding site motif A (P-loop) 

PROSITE cross-reference(s): PS00017; ATP.GTP^A 
40 [2019] From sequence comparisons and crystallographic data analysis it has been shown [1,2.3,4.5,6] that an ap- 
preciable proportion of proteins that bind ATP or GTP share a number of more or less conserved sequence motifs. 

The best conserved of these motifs is a glycine-rich region, which typically forms a flexible loop between a beta-strand 

and an alpha-helix. This loop interacts with one of the phosphate groups of the nucleotide. This sequence motif is 

generally referred to as the 'A' consensus sequence [1] or the 'P-loop' [5]. 
45 [2020] There are numerous ATP- or GTP-binding proteins in which the P-loop is found. A number of protein families 

for which the relevance of the presence of such motif has been noted is listed below: 

ATP synthase alpha and beta subunits (see <PDOC00137>). 

Myosin heavy chains. 
50 - Kinesin heavy chains and kinesin-like proteins (see <PDOC00343>). 

Dynamins and dynamin-like proteins (see <PDOC00362>). 

Guanylate kinase (see <PDOC00670>). 

Thymidine kinase (see <PDOC00524>). 

Thymidylate kinase (see <PDOC01034>). 
55 - Shikimate kinase (see <PDOC00868>). 

Nitrogenase iron protein family (nifH/frxC) (see <PDOC00580>). 

ATP-binding proteins involved in 'active transport' (ABC transporters) [7] (see <PDOC00185>). 
- DNA and RNA helicases [8.9.10]. 
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GTP-bindIng elongation factors (EF-Tu, EF-1 alpha. EF-G. EF-2, etc.). 

R3E family of GTP-blnHinn proteins (Ras. Rho. Rab. Ral. Ypt1 , SEC4. etc.). 

Nuclear protein ran (see <PDOC00859>). 

ADP-ribosylation factors family (see <PDOC00781>). 

Bacterial dnaA protein (see <PDOC00771>). 

Bacterial recA protein (see <PDOC00131>). 

Bacterial recF protein (see <PDOC00539>). 

Guanine nucleotide-binding proteins alpha subunits (Gi, Gs. Gt. GO. etc.). 
DNA mismatch repair proteins mutS family (See <PDOC00388>). 
Bacterial type It secretion system protein E (see <PDOC00567>). 



[20211 Not all ATP- or GTP-blnding proteins are picked-up by this motif. A number of proteins escape detection 
because the structure of their ATP-binding site is completely differentfrom that of the P-loop^^^^^ 
are the E1-E2 ATPases or the glycolytic kinases. In other ATP- or GTP-binding proteins the flexible loop exists n a 
slightly different fomi; this is the case for tubulins or protein kinases. A special mention must be reserved for adenylate 
kinase, in which there is a single deviation from the P-loop pattern: in the last position Gly is found instead of Ser or Thr. 
[2022] Consensus pattem[AG]-x(4)-G-K-[ST| 



[1] 

Walker J.E., Saraste M., Runswick M.J., Gay N.J. 

EMBO J. 1:945-951(1982). 

[2] 

Molier W., Amons R. 
FEBS Lett. 186:1-7(1985). 
[3] 

Fry D.C.. Kuby S.A., Mildvan A.S. 

Proc. Natl. Acad. Sci. U.S.A. 83:907-911(1986). 

[4] 

DeverTE., Glynias M.J., Merrick W.C. 

Proc. Natl. Acad. Sci. U.S.A. 84:1814-1818(1987). 

[5] 

Saraste M., Sibbald P.R.. Wittinghofer A. 
Trends Biochem. Sci. 15:430-434(1990). 
[6] 

Koonin E.V. 

J. Mol. Biol. 229:1165-1174(1993). 

Higgins C.F., Hyde S.C., Mimmack M.M., Gileadi U,. Gill D.R.. Gallagher M.P. 

J. Bioenerg. Biomembr. 22:571-592(1990). 

[8] 

Hodgman T.C. 

Nature 333:22-23(1988) and Nature 333:578-578(1988) (En-ata). 
[9] 

Under P., Lasko P.. Ashbumer M., Leroy P.. Nielsen P.J.. Nishi K.. 
Schnier J., Slonimski P.P. 
Nature 337:121-122(1989). 
[10] 

Gorbalenya A.E.. Koonin E.V., Donchenko A.P., Blinov V.M. 
Nucleic Acids Res. 17:4713-4730(1989). 



r20231 821 PE* PE family 

Vhis family named after a PE moW near to the amino terminus of the domain. The PE family of proteins all contain an 
amino-terminal region of about 110 amino acids. The carboxyl terminus of this family are vanat^e and fall .nto several 
Classes The largest class of PE proteins is the highly repetitive PGRS class which have a high glycine conterrt. The 
function of these proteins is uncertain but it has been suggested that they may be related to antigenic vanation of 

Mycobacterium tuberculosis [1]. Number of members: 88 . . ^ ^ 

r20241 ril Medline: 98295987. Deciphering the biology of Mycobacterium tuberculosis from the complete genome 
sequence. Cole ST. Brosch R. Parkhill J. Gamier T. Churcher C. Harris D. Gordon SV, Eiglmeier K. Gas S. Barry CE 
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3rd. Tekaia F, Badcock K, Basham D, Brown D, Chillingworth T, Connor R, Davies R, Devlin K, Feltwell T. Gentles S, 
Hamlin N. Holroyd S. Homsby T, Jagels K. Ban-ell BG. et al; Nature 1998;393:537-544. 
[2025] 822. (RNB) RIbonuclease II family signature 
PROSITE cross-reference(s): PS01175; RIBONUCLEASE_ll 
5 [2026] On the basis of sequence similarities, the following bacterial and eukaryotic proteins seem to form a family: 

Escherichia coli and related bacteria ribonuclease II (EC 3.1.13.1) (RNase II) (gene mb) [1]. RNase II is an exo- 
nuclease involved in mRNA decay. It degrades mRNA by hydrolyzing single-stranded polyribonucleotides proces- 
sively in the 3' to 5' direction. 

10 - Bacterial protein vacB. In Shigella flexneri. vacB has been shown to be required for the expression of virulence 
genes at the posttranscriptional level. 

Yeast protein SSD1 (or SRK1) which is implicated in the control of the cell cycle G1 phase. 
Yeast protein DIS3 [2], which binds to ran (GSP1 ) and ehances the the nucleotide-releasing activity of RCC1 on ran. 
Fission yeast protein dis3, which is implicated in mitotic control. 
15 . Neurospora crassa cyt-4, a mitochondrial protein required for RNA 5' and 3* end processing and splicing. 
Yeast protein MSU1, which is involved in mitochondrial biogenesis. 

Synechocystis strain PCC 6803 protein zam [3], which control resistance to the carbonic anhydrase inhibitor aceta- 
zolamide. 

Caenorhabditis etegans hypothetical protein F48E8.6. 

20 

[2027] The size of these proteins range from 644 residues (mb) to 1250 (SSD1). While their sequence is highly 
divergent they share a conserved domain in their C-terminal section [4]. It is possible that this domain plays a role in 
a putative exonuclease function that would be common to all these proteins. A signature pattern was developed based 
on the core of this conserved domain. 
25 [2028] Consensus pattern[HI]-[FYE]-[GSTAMHUVM]-x(4.5)-Y-[STAL]-x-[FWVAC]-[TV]-[SA]-P-[LIVMA]-[RQ]-[KRl- 
[FY]-x-D-x(3)-[HQ] 

[1] 

Zilhao R., Camelo L., Arraiano CM. 
30 MoL Microbiol. 8:43-51(1993). 

[2] 

Noguchi E., Hayashi N., Azuma Y., Seki T.. Nakamura M., Nakashima N., 
Yanagida M., He X.. Mueller U.. Sazer S.. Nishimoto T. 
EMBO J. 15:5595-5605(1996). 
35 [3] 

Beuf L.. Bedu S.. Cami B., Joset F. 
Plant Mol. Biol. 27:779-788(1995). 
[4] 

Mian I.S. 

40 Nucleic Acids Res. 25:3187-3195(1997). 

[2029] 823. Src homology 2 (SH2) domain profile 
PROSITE cross-reference(s): PS50001; SH2 

[2030] The Src homology 2 (SH2) domain is a protein domain of about 100 amino-acid residues first identified as a 
45 conserved sequence region between the oncoproteins Src and Fps [1]. Similar sequences were later found in many 
other intracellular signal-transducing proteins [2]. SH2 domains function as regulatory modules of intracellular signalling 
cascades by interacting with high affinity to phosphotyrosine-containing target peptides in a sequence-specific and 
strictly phosphoryiation-dependent manner [3,4,5,6]. 

[2031] The SH2 domain has a conserved 3D structure consisting of two alpha helices and six to seven beta-strands. 
50 The core of the domain is formed by a continuous beta-meander composed of two connected beta-sheets [7]. 
[2032] So far, SH2 domains have been identified in the following proteins: 

Many vertebrate, invertebrate and retroviral cytoplasmic (non-receptor) protein tyrosine kinases. In particular in 
the Src, Abl, Bkt, Csk and ZAP70 families of kinases. 
55 . Mammalian phosphatidytinositol-specific phospholipase C gamma-1 and -2. Two copies of the SH2 domain are 
found in those proteins in between the catalytic 'X- and *Y-boxes'(see <PDOC50007>). 
Mammalian phosphatidyl inositol 3-kina5e regulatory p85 subunit. 
Some vertebrate and invertebrate protein-tyrosine phosphatases. 
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^SS^ °i:r;r S:?oC^^^ exchange factors to growth factor recptors: vertebrate 

GRB2. Caenorhabditis elegans sem-5 and Drosophila DRK. ^.nnoA famiix/ 

cytoplasmic proteins Nek, She. 
- STAT proteins (signal transducers and activators of transcnption). 
Chicken tensln. 

Yeast transcriptional control protein SPT6. 
[2033] The profile developed to detect SH2 domains is based on a structural alignment consisting of 8 gap-free 
blocks and 7 linker regions totaling 92 match positions. 

Sadowski I., Stone J.C., Pawson T. 
Mol. Cell. Biol. 6:4396-4408(1986). 
[21 

Russel R.B.. Breed J.. Barton G.J. 
FEBS Lett. 304:15-20(1992). 
[3] 

Marangere L.E.M., Pawson T. 

J. Cell Scl. Suppl. 18:97-104(1994). 

[4] 

Pawson T., Schlessinger J. 
Cun-- Biol. 3:434-442(1993). 

[5] 

Mayer B.J., Baltimore D. 
Trends Cell. Biol. 3:8-13(1993). 
[6] 

Pawson T. 

Nature 373:573-580(1995). 
17] 

Kuriyan J., Cowbum D. 

Curr. Opin. Struct. Biol. 3:828-837(1993). 

[2034] 824. Sulfate transporters signature 

terized proteins have been shown [1,2] to be evolutionary related. These proteins are: 

- Neurospora crassa sulfate permease II (gene eys-14). 

- Yeast sulfate penneases (genes SUL1 and SUL2). 

: n'i^oTaTo^^^^^^^ 

dysplasia (DTD). ^ 

- Sulfate transporters 1 , 2 and 3 from the legume Stylosanthes hamata. 

- Human pendrin (gene PDS), which is involved in a number of hearing loss genetic diseases. 

- Human protein DRA (Down-Regulated in Adenoma). 
Soybean early nodulin 70. 

- Escherichia coli hypothetical protein ychM. 

- Caenorhabditis elegans hypothetical protein F41 D9.5, 



300 



EP 1 033 405 A2 



Sandal Marcker K.A. 

Trends Biochem. Sci. 19:19-19(1994). 

(2] 

5 Smith F.W.. Hawkesford M.J., Prosser I.M.. Clarkson D.T. 

MoL Gen. Genet. 247:709-715(1995). 

[2038] 825. TYA: TYA transposon protein 

Ty are yeast transposons. A 5.7kb transcript codes for p3 a fusion protein of TYA and TYB. The TYA protein is analogous 
10 to the gag protein of retroviruses. TYA a is cleaved to form 46kd protein which can form mature virion like particles [1]. 
Number of members: 59 

[2039] [1] Medline: 97404699. Cryo-electron microscopy structure of yeast Ty retrotransposon virus-like particles. 
Palmer KJ. Tichelaar W, Myers N, Bums NR. Butcher SJ, Kingsman AJ. Fuller SD, Saibil HR; J Virol 1997;71: 
6863-6868. 
15 [2040] 826. Aldolase_ll 

Class II Aldolase and Adducin N-terminal domain. 

This family includes class II aldolases and adducins which have not been ascribed any enzymatic function. Number 
of members: 37 

20 

References: 
[2041] 

25 [1] Medline: 9329481 9. The spatial structure of the class 11 L-fuculose-1 -phosphate aldolase from Escherichia coli. 

Dreyer MK. Schulz GE; J Mol Biol 1993;231:549-553. 

[2] Medline: 96256522. Catalytic mechanism of the metal-dependent fuculose aldolase from Escherichia coli as 
derived from the structure. Dreyer MK, Schulz GE; J Mol Biol 1996;259:458-466. 

30 [2042] 827. CBD_2 

-!- Two tryptophan residues are involved in cellulose binding. 

-1- Cellulose binding domain found in bacteria. Number of members: 51 

35 References: 

[2043] [1] Medline: 95284032. Solution structure of a cellulose-binding domain from Cellulomonas fimi by nuclear 
magnetic resonance spectroscopy. Xu GY, Ong E, Gilkes NR, Kilburn DG, Muhandiram DR, Harris-Brandts M, Carver 
JP, Kay LE, Harvey TS; Biochemistry 1995;34:6993-7009. 
40 [2044] 828. P 

A unique feature of the eukaryotic subtilisin-like proprotein convertases is the presence of an additional highly con- 
served sequence of approximately 150 residues (P domain) located immediately downstream of the catalytic domain. 
Number of members: 91 

45 References: 

[2045] 

[1] Medline: 94252314. A C-terminal domain conserved In precursor processing proteases is required for intramo- 
50 lecular N-termtnal maturation of pro-Kex2 protease. Gluschankof P, Fuller RS; EMBO J 1994;13:2280-2288. 

[2] Medline: 98225190. Regulatory roles of the P domain of the subtilisin-like prohormone convertases. Zhou A, 
Martin S. Lipkind G. LaMendola J. Steiner DF; J Biol Chem 1998;273:11107-11114. 

[2046] 829. Uncharacterized protein family UPF0020 signature 
55 PROSITE cross-reference(s): PS01 261 ; UPF0020 

The following uncharacterized proteins have been shown [1] to share regions of similarities: 

Escherichia coli hypothetical protein ycbY and HI0116/15, the corresponding Haemophilus influenzae protein. 
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- Bacillus subtilis hypothetical protein ypsC. 

S'Tsechocystis strain 6803 hypothetical protein slr0064. 
. Methanococcus jannaschii hypothetical proteins MJ0438 and M J071 0. 

[2047] These are hydrophillc proteins of from 40 Kd to about 80 Kd. They can be picked up In the database by the 
following pattern. 

[2048] Consensus pattemD-P-[LIVMF]-C-G-[ST]-G-x(3)-[LI]-E 
References: 

[2049] [ 1] Bairoch A. Unpublished obsen/ations (1997). 

[2050] 830. Uncharacterized protein family UPF0031 signatures ^ r>r«t^j««= 

PROSITE cross-reference(s): PS01049; UPF0031,1; PS01050; UPF0031_2 The following uncharactenzed proteins 
have been shown [1] to share regions of similarities: 

- Yeast chromosome XI hypothetical protein YKL151c. 

- Caenorhabditis elegans hypothetical protein R107.2. 
Escherichia coli hypothetical protein yjeF. 
Bacillus subtilis hypothetical protein yxkO. 
Helicobacter pylori hypothetical protein HP1363. 

- Mycobacterium tuberculosis hypothetical protein MtCY77.05c. 
Mycobacterium leprae hypothetical protein B229_C2_201. 

- Synechocystis strain PCC 6803 hypothetical protein sni433. 

- Methanococcus jannaschii hypothetical protein MJ1 586. 

[20511 These are proteins of about 30 to 40 Kd whose central region is well conserved. They can be picked up in 
the database by the following patterns. 

[2052] Consensus pattem[SAV]-[IV\Aq-[LVA]-[LIV]-G-[PNS]-G-L-[GP]-x-[DENQTl 
Consensus pattern[GA]-G-x-G-D-ITV]-[LT]-[STA]-G-x-[LIVM] 
[2053] 831.(ACOX) 

r20M?^sra^^^ 

[11- 

Number of members: 39 

[2055] [11 Hayashi H. De Bellis L, Yamaguchi K. Kato A. Hayashi M. Nishimura M; Medline: 98192624. Molecular 
characterization of a glyoxysomal long chain acyl-CoA oxidase that is synthesized as a precursor of higher molecular 
mass in pumpkin." J Biol Chem 1998;273:8301-8307. 
[2056] 832. (AICARFT_IMPCHas) 

"^Thfe^s a famfi^^^ enzymes catalysing the last steps in de novo purine biosynthesis. The bifuric- 

tional enzyme is found in both prokaryotes and eukaryotes. The second last step is catalysed by f-^^j;'^^'^"'^'-^^^- 
4-carboxarnlde ribonucleotide formyltransferase EC:2.1 .2.3 (AlCARFT), this enzyme catalyses the 
iTwith 10-fom,yl-tetrahydrofolate to yield FAICAR and tetrahydrofolate [1]. The last step .s catalysed by IMP (Inosme 
monophosphate)cyclohydrolaseEC:3.5.4.10(IMPCHase).cyclizingFAICAR{5-fonmylaminoim.dazole-l-carboxam.de 

rittonucleotide) to IMP [1]. 
Number of members: 22 
[2058] 

[1] Akira T. Komatsu M, Nango R. Tomooka A. Konaka K. Yamauchi M. Kitamura Y. Nomura '^'"'^"1°!°.!: 
Medline: 97473523 Molecular cloning and expression of a rat cDNA encoding 5-am.no.m.dazole^-«rt«xam^^^^ 
ribonucleotide formyltransferase/lMP cyclohydrolase" [published erratum appears in Gene 1998 Feb 27.208(2). 

KmirE^ M^rosIniBi^eardsley GP; Medline: 96147205 The human purH gene product. 5-aminoimidazole- 
4-canK«amide ribonucleotide formyltransferase/lMP cyclohydrolase. Cloning, sequencing, expression, punfica- 
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tion. kinetic analysis, and domain mapping.** J Biol Chem 1996;271:2225-2233. 

[2059] 833. (AOX) 
Alternative oxidase 

s [2060] The attemative oxidase is used as a second terminal oxidase in the mitochondria, electrons are transfered 
directly from reduced ubiquinol to oxygen forming water [2]. This is not coupled to ATP synthesis and is not inhibited 
by cyanide, this pathway is a single step process [1]. In rice the transcript levels of the altemative oxidase are increased 
by low temperature [1]. 

10 Number of members: 27 

[2061] 

[1] Ito Y. Saisho D. Nakazono M. Tsutsumi N, Hirai A; Medline: 98086211 Transcript levels of tandem-arranged 
15 altemative oxidase genes in rice are increased by low temperature." Gene 1997;203:121-129. 

[2] Li Q. Ritzel RG, McLean LL. Mcintosh L, KoT, Bertrand H, Nargang FE; Medline: 96366413 Cloning and analysis 
of the alternative oxidase gene of Neurospora crassa." Genetics 1996;142:129-140. 

20 [2062] 834. (APH) 

Protein kinases signatures and profile 

[2063] Cross-reference(s): PS00107; PROTEIN_KINASE_ATP, PS00108; 
PROTEIN_KINASE_ST. PS00109; PROTEIN_KINASE_TYR, PS50011; 
PROTEIN_KINASE_DOM 

25 [2064] Eukaryotic protein kinases [1 to 5] are enzymes that belong to a very extensive family of proteins which share 
a conserved catalytic core common to both serine/threonine and tyrosine protein kinases. There are a number of 
conserved regions in the catalytic domain of protein kinases. Two of these regions have been selected to build signature 
patterns. The first region, which is located in the N-terminal extremity of the catalytic domain, is a glycine-rich stretch 
of residues in the vicinity of a lysine residue, which has been shown to be involved in ATP binding. The second region, 

30 which is located in the central part of the catalytic domain, contains a conserved aspartic acid residue which is important 
for the catalytic activity of the enzyme [6]; two signature patterns were derived for that region: one specific for serine/ 
threonine kinases and the other for tyrosine kinases. A profile was developed which is based on the alignment in [1] 
and covers the entire catalytic domain. 

[2065] Consensus pattern: [LIV]-G-{P}-G-{P}-[FYWMGSTNH]-[SGA]-{PW}-[LIVCAT]-{PD}-x- [GSTACLIVMFY]-x 
35 (5.18)-tLIVMFYWCSTAR]-[AIVP]-[LIVMFAGCKRl-K [K binds ATP] 

[2066] Sequences known to belong to this class detected by the pattern the majority of known protein kinases but it 
fails to find a number of them, especially viral kinases which are quite divergent in this region and are completely 
missed by this pattern. 

[2067] Consensus pattern: [LIVMFYC]-x-(HY]-x-D-ILIVMFY]-K-x(2)-N-[LIVMFYCT](3) [D is an active site residue] 
40 [2068] Sequences known to belong to this class detected by the pattern. Most serine/ threonine specific protein 
kinases with 10 exceptions (half of them viral kinases) and also Epstein-Barr virus BGLF4 and Drosophila ninaC which 
have respectively Ser and Arg instead of the conserved Lys and which are therefore detected by the tyrosine kinase 
specific pattern described below. 

[2069] Consensus pattern: [LIVMFYC)-x-[HY]-x-D-[LIVMFY]-[RSTAC]-x(2)-N-[LIVMFYC](3) [D is an active site res- 
<5 idue] tyrosine specific protein kinases with the exception of human ERBB3 and mouse bik. This pattern will also detect 
most bacterial aminoglycoside phosphotransferases [8,9] and herpesviruses ganciclovir kinases [10]; which are pro- 
teins structurally and evolutionary related to protein kinases. Sequences known to belong to this class detected by the 
profile ALL, except for three viral kinases. This profile also detects receptor guanylate cyclases (see <PDOC00430>) 
and 2-5A-dependent ribonucleases. Sequence similarities between these two families and the eukaryotic protein kinase 
so family have been noticed before. It also detects Arabidopsis thaliana kinase- like protein TMKL1 which seems to have 
lost its catalytic activity. 

[2070] Note if a protein analyzed includes the two protein kinase signatures, the probability of it being a protein kinase 
is close to 100%. Note eukaryotic-type protein kinases have also been found in prokaryotes such as Myxococcus 
xanthus [11] and Yersinia pseudotuberculosis. Note the pattems shown above has been updated since their publication 
55 in [7]. Note this documentation entry is linked to both signature pattems and a profile. As the profile is much more 
sensitive than the pattems, you should use it if you have access to the necessary software tools to do so. 
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References 
[2071] 

[ 1] Hanks S.K., Hunter T. FASEB J. 9:576-596(1995). 

[2] Hunter T, Meth. Enzymol. 200:3-37(1991). 

[ 3] Hanks S.K., Quinn A.M., Meth. Enzymol. 200:38-62(1991). 

[4] Hanks S.K., Curr. Opin. Struct. Biol. 1:369-383(1991). 

r *;i MankQ <5 K Ouinn A M Hunter T, Science 241:42-52(1988). 

[ II KnigSon D^^^^^^^^ Ten EyckL.F.. Ashford V.A.. Xuong N.-H.. Tay.or. S.S.. Sowadski J.M.. Sconce 253: 
407-414(1991). 

[ 7] Bairoch A.. Claverie J.-M., Nature 331:22(1988). 

[ 8) Benner S., Nature 329:21-21(1987). 

[9] Kirby R., J. Mol. Evol. 30:489^92(1992). 

[101 Littler E , Stuart A.D., Chee M.S., Nature 358:160-162(1992). 

Ill] Munoz-Dorado J.. Inouye S.. Inouye M.. Cell 67:995-1006(1991). 

[2072] 835. (Asp_Glu_race) 

Aspartate and glutamate racemases signatures , 
[2073] Cross-reference(s) PS00923; ASP_GLU_RACEMASE_1 PS00924: 

erSmesS do not seem to Require a cofactor for their activity [1]. Glutamate ---^t' ^bas^ aSSsu^^^^^^ 
mate into D-glutamate. is required for the biosynthesis of peptidoglycan and some peptide-based a"t'biottcssuch as 
aramicidin S n Ldition to characterized aspartate and glutamate racemases, this family also mcludes a hypo*et.cal 
Sn fSm En^^nte carotovora and one from Escherichia coli (ygeA). Two conserved cyste.nes are present ,n he 
K-ncerttLe enzymes. They areexpectedtoplayaroleincatalyticactivity by actingasbases.nprotonabsfr^^^^^ 

from the substrate. Signature patterns were developed fo-^both cysteines. 

[2075] Consensus pattern: [IVA]-[LIVM]-x-C-x(0.1)-N-[STl-[MSA]-[STH]-[LIVFYSTANK] 

consensus pattern: [LIVMl(2)-x-lAG]-C-T-(DEH]-[LIVMFY]-[PNGRS]-x-[LIVMl 

[2076] [ 1] Gallo K.A., Knowles J.R.. Biochemistry 32:3981-3990(1993). 

[2077] 836. (ATP-sulfurylase) 

S»7w"%ms family consists of ATP-sulfurylase or sulfate adenylyltransferase EC:2.7.7.4 some of which are part of a 
SSona po^;;^^^^^^^^^ assodated wHh adenosyl phosphosulphate (APS) 

are requiredTor PAPS (phosphoadenosine-phosphosulfate) synthesis from inorganic sulphate [2]. ATP sulfurylase 
catalyses the synthesis of adenosine-phosphosulfate APS from ATP and inorganic sulphate [1]. 

Number of members: 37 
[2079] 

ril Kurima K Warman ML Krishnan S, Domowicz M, Krueger RC Jr, Deyrup A, Schwartz NB; Medline: 98337975 
Z^^Z^^^^^^su^^^^^^'^^ enzymes causes murine f "JV-^ff ^^^^^^^^^^^^ ''''^ 
in Proc Natl Acad Sci U S A 1998 Sep 29:95(20):12071] Proc Natl Acad Sci U S A 1998.95^8681-8685 _ 
2] RosenThal E. Leustek T; Medline: 96096529 A multifunctional Urechis caupo protein. PAPS synthetase, has 
both ATP sulfurylase and APS kinase activities." Gene 1995;165:243-248. 

[2080] 837. (ATP-synt_F) 
r^BlTTh^fSfnc^^^^^^^^^^^ 

[2]. The family also includes archaebacterial ATP synthase subunit F [3]. 

Number of members: 23 

[2082] 

[1] Guo Y. Kaiser K. Wieczorek H, Dow JA; Medline: 96269411 The Drosophila melanogaster gene vhal4 encoding 
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a 14-kDa F-subunit of the vacuolar ATPase." Gene 1996;172:239-243. 

(2) Peng SB, Crider BP, Tsal SJ, Xie XS, Stone DK; Medline: 9621 641 6 Identification of a 1 4-kDa subunit associated 
with the catalytic sector of clathrin-coated vesicle H+'ATPase." J Biol Chem 1996;271:3324-3327. 
[3] Wilms R. Freiberg C. Wegerle E. Meier I. Mayer F. Muller V; Medline: 96324968 Subunit structure and organ- 
5 ization of the genes of the A1 AO ATPase from the Archaeon Methanosarcina maze! Gol." J Biol Chem 1996;271 : 

18843-18852. 

[2083] 838. (CBD_4) 
Starch binding domain 

10 

Number of members: 48 
[2084] 839. (CbiX) 

[2085] The function of CbiX is uncertain, however it is found in cobalamin biosynthesis operons and so may have a 
15 related function. Some CbiX proteins contain a striking histidine-rich region at their C-temiinus, which suggests that it 
might be involved in metal chelation [1]. 

Number of members: 6 

20 [2086] [1] Raux E, Lanois A, Warren MJ. Rambach A. Thermes C; Medline: 98416126 Cobalamin (vitamin B12) 
biosynthesis: identification and characterization of a Bacillus megaterium cob! operon.** Biochem J 1998;335:159-166. 

840. (Complex1_51K) 

25 [2087] Respiratory-chain NADH dehydrogenase 51 Kd subunit signatures Cross-reference(s) PS00644; 
COMPLEX1_51K_1 PS00645; COMPLEX 1_51K_2 

[2088] Respiratory-chain NADH dehydrogenase (EC 1.6.5.3) [1,2] (also known as complex I or NADH-ubiquinone 
oxidoreductase) is an oligomeric enzymatic complex located in the inner mitochondrial membrane which also seems 
to exist in the chioroplast and in cyanobacteria (as a NADH-plastoquinone oxidoreductase). Among the 25 to 30 
30 polypeptide subunits of this bioenergetic enzyme complex there is one with a molecular weight of 51 Kd (in mammals), 
which is the second largest subunit of complex I and is a component of the iron-sulfur (IP) fragment of the enzyme. It 
seems to bind to NAD, FMN, and a 2Fe-2S cluster. 
[2089] The 51 Kd subunit is highly similar to [3,4]: 

35 . Subunit alpha of Alcaligenes eutrophus NAD-reducing hydrogenase (gene hoxF) which also binds to NAD, FMN, 
and a 2Fe-2S cluster. 

Subunit NQ01 of Paracoccus denitrificans NADH-ubiquinone oxidoreductase. 
Subunit F of Escherichia coli NADH-ubiquinone oxidoreductase (gene nuoF). 

40 [2090] The 51 Kd subunit and the bacterial hydrogenase alpha subunit contains three regions of sequence similar- 
ities. The first one most probably corresponds to the NAD-binding site, the second to the FMN-binding site, and the 
third one, which contains three cysteines, to the iron-sulfur binding region. Signature patterns have been developed 
for the FMN-binding and for the 2Fe-2S binding regions. 

[2091] Consensus pattern: G-[AMJ-G-[AR]-Y-[LIVM]-C-G-[DE](2)-[STA](2)-[LIM](2)-[EN]- S 
45 Consensus pattern: E-S-C-G-x-C-x-P-C-R-x-G [The three C*s are putative 2Fe-2S ligands] 

[ 1] Ragan C.I., Curr. Top. Bioenerg. 15:1-36(1987). 

[ 2] Weiss H.. Friedrich T, Hofhaus G., Preis D.. Eur. J. Biochem. 197:563-576(1991). 
[ 3] Fearnley I.M.. Walker J.E. Biochim. Biophys. Acta 1140:105-134(1992). 
50 1 4] Weidner U., Geier S., Rock A.. Friedrich T., Leif H.. Weiss H., J. Mol. Biol. 233:109-122(1993). 

[2092] 841. (DAP_epimerase) 
Diaminopimelate epimerase signature 
[2093] Cross-reference{s) PS01 326; DAP_EPIMERASE 
55 Diaminopimelate epimerase (EC 5.1.1.7) catalyzes the isomeriazation of L.L- to D,L-meso-diaminopimeiate in the 
biosynthetic pathway leading from aspartate to lysine. This enzyme is a protein of about 30 Kd. Two conserved cysteines 
seem [1] to function as the acid and base in the catalytic mechanism. As a signature pattern, the region surrounding 
the first of these two active site cysteines were selected. 



305 



EP 1 033 405 A2 



[2094] Consensus pattern: N-x-D-G-S-x(4)-C-G-N-[GA]-x-R [C is an active site residue] Sequences known to belong 
L .Ki-^.„». A^^Mc^ h« th« na««m ALL. exceot for an Anabaena dapF which has a Ser instead of the active site Cys. 
possfl 1] Cirilli M., Zheng" R.. Scapin G.. Blanchard J.S.. Biochemistry 37:16452-16458(1998). 
[2096] 842. (DNA_gyraseB_C) 
DNA topoisomerase II signature 

r20971 Cross-reference(s) PS00177; TOPOISOMERASE_ll 

DNA topoisomerase I (EC 5.99.1.2) [1.2,3.4,E1] is one of the two types of enzyme that catalyze the 'nterconvereion 
of topological DNA isomers. Type II topoisomerases are ATP-dependent and act by passing a DNA segment through 
a transient double-strand break. Topoisomerase II is found in phages, archaebacteria prokaryotes eukaryotes and 
in African Swine Fever virus (ASF). In bacteriophage T4 topoisomerase II consists of three subunits (t^e product of 
qeS 52 and 60). In prokaryotes and in archaebacteria the enzyme, known as DNA gyrase, consists of two subunits 
?genes g;S and gyre [E2]). In some bacteria, a second type .1 topoisomerase has been identified; .t is known as 
topoisomerase IV and is required for chromosome segregation. » also consists of two subunits (genes parC and parE). 
In eukarvotes. type II topoisomerase is a homodimer. . .. tk« 

[2098] There are many regions of sequence homology between the different subtypes of topoisomerase II. The 
relation between the different subunits is shown in the following representation: 



About-1400-residues- 



[ Protein 39-* ][ — Protein 52 — ] Phage T4 

[ gyrB * ][ gyrA- ] Prokaryotell 

Archaebacteria 

[ parE * ][ parD 1 ProkaryotelV 

J * ] Eukaryote and 

ASF 

'*': Position of the pattern. 



[2099] As a signature pattern for this family of proteins, a region that contains a highly conserved pentapeptide was 
selected. The pattern is located in gyrB. in parE. and in protein 39 of phage T4 topoisomerase. 
[2100] Consensus pattem: [LIVMA]-x-E-G-[DN]-S-A-x-[STAG] 

[ 1] Sternglanz R., Cun-. Opin. Cell Biol. 1:533-535(1990). 

[2] Bjomsti M.-A.. Cur. Opin. Struct. Biol. 1:99-103(1991). 

[ 3] Sharma A.. Mondragon A., Curr. Opin. Struct. Biol. 5:39-47(1995). 

[4] Roca J., Trends Biochem. Sci. 20:156-160(1995). 

[2101] 843. (DUF16) 
Protein of unknown function 

[21 02] The function of this protein is unknown. It appears to only occur in Mycoplasma pneumoniae. 
Number of members: 26 

[2103] [1] Himmelreich R. Hilbert H. Plagens H. PirkI E. Li BC. Herrmann R; Medline: 97105885 Coo^te sequence 
analysis of the genome of the bacterium Mycoplasma pneumoniae." Nucleic Acids Res 1996;24:442(M449. 
[21041 844. (DUF21) 

[2105] Domain of unknown function 

2106 This transmembrane region has no known function. Many of the sequences in this family are annotated as 
hemolysins, however this is due to a similarity to Swiss:Q54318 that does not contain this domain. This domain is 
found in the N-terminus of the proteins adjacent to two intracellular CBS domains CBS. 
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Number of members: 42 

[2107] 845.(DUF56) 
[2108] Integral membrane protein 
5 [2109] The members of this family are putative integral membrane proteins. The function of the family is unknown, 
however the family includes Sec59 from yeast. Sec59 is adolichol kinase EC:2.7.1 .108, but it is not clear if the enzymatic 
activity resides in this region or its N terminal region. 

Number of members: 13 

10 

[2110] 846. (DUF94) 

[2111] Domain of unknown function 

[2112] The function of this domain is unknown. It is found in both eukaryotes and archaebacteria. The alignment 
contains a completely conserved aspartate residue that may be functionally important. The eukaryotic domains contains 
15 three conserved cysteines and a histidtne that might be metal binding, however these are absent in the archaebacterial 
proteins. 

Number of members: 9 

20 [2113] 847. (FF) 
[2114] FF domain 

[2115] This domain may be involved in protein-protein interaction [1]. 
Number of members: 42 

25 

[2116] [1] Bedford MT. Leder P; Medline: 99322199 The FF domain: a novel motif that often accompanies WW 
domains." Trends Biochem Sci 1999;24:264-265. 
[2117] 848. (FLO.LFY) 
Floricaula / Leafy protein 

30 [211 8] This family consists of various plant development proteins which are homologues of floricaula (FLO) and Leafy 
(LFY) proteins which are floral meristem identity proteins. Mutations in the sequences of these proteins affect flower 
and leaf development. 

Number of members: 16 

35 

[2119] 

[1] Hofer J, Turner L, Hellens R. Ambrose M. Matthews P. Michael A. Ellis N; Medline: 97411151 UNIFOLIATA 
regulates leaf and flower morphogenesis in pea." Curr Biol 1997;7:581-587. 
40 [2] Weigel D, Alvarez J, Smyth DR, Yanofsky MF, Meyerowitz EM; Medline: 92274452 LEAFY controls floral mer- 

istem identity in Arabidopsis." Cell 1992;69:843-859. 

[2120] 849. (G-patch) 
G-patch domain 

45 [2121] This domain is found in a number of RNA binding proteins, and is also found in proteins that contain RNA 
binding domains. This suggests that this domain may have an RNA binding function. This domain has seven highly 
conserved glycines. 

Number of members: 47 

50 

[21 22] [1 ] Aravtnd L, Koonin EV; Medline: 1 0470032 G-patch: a new conserved domain in eukaryotic RNA-processing 
proteins and type D retroviral polyproteins." Trends Biochem Sci 1999;24:342-344. 
[2123] 850. (Gram-ve_porins) 
General diffusion Gram-negative porins signature 
55 [2124] Cross-reference(s) PS00576; GRAM_NEG_PORIN 

The outer membrane of Gram-negative bacteria acts as a molecular filter for hydrophilic compounds. Proteins, known 
as porins [1]. are responsible for the 'molecular sieve* properties of the outer membrane. Porins form large water- filled 
channels which allows the diffusion of hydrophilic molecules into the periplasmic space. Some porins form general 
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and the transmembrane core of these proteins is composed exclusively of beta strands [2]. It has been shown i^j 
a number of general porlns are evolutionary related, these ponns are: 

Enterobacteria phoE. 
Enterobacteria ompC. 
Enterobacteria ompF. 
Enterobacteria nmpC. 
Bacteriophage PA-2 LC. 
Neisseria PI. A. 
Neisseria Pl.B. 

[2125] AS a Signature pattern a conserved region was selected, located in the C-terminal part of these proteins, 
which soans two DUtative transmembrane beta strands. ,^„^„ ^ , 

tS^rol^r^us pattern: [LlVMFY]-x(2)-G-x(2).Y-x.^^^^^^ 

[1] Benz R., Bauer K., Eur. J. Biochem. 176:1-19(1988). 

[21 Jap B.K.. Walian RJ.. Q. Rev. Biophys. 23:367-403(1990). 

[3] Jeanteur D.. Lakey J.H.. Pattus P.. Mol. Microbiol. 5:2153-2164(1991). 

[2127] 851.(HlyD) 

HlyD family secretion proteins signature 

composed [1 to 5] of the following members: 



Gene 


Species 


Protein which is exported 


hlyD 


Escherichia coli 


Hemolysin 


appD 


A.pleuropneumoniae 


Hemolysin 


IcnD 


Lactococcus lactts 


Lactococcin A 


IktD 


A.actinomycetemcomitans 


Pasteurella haemolytica Leukotoxin 


rtxD 
cyaD 


A. pleuropneumoniae 
Bordetella pertussis 


Toxin-ill 

Calmodulin-sensitive adenylate cyclase-hemolysin(cyclolysin 


cvaA 


Escherichia coli 


Collcin V 


prtE 


Erwinia chrysanthemi 


Extracellular proteases B and C 


aprE 


Pseudomonas aeruginosa 


Alkaline protease 


emrA 


Escherichia coli 


Drugs and toxins 


yjcR 


Escherichia coli 


1 Unknown 



These proteins are evolutionary related and consist offrorn 390 to 480 amino acid residu^^^^^^^^ 

'~cTrars?ctL^^^^^^ 

ti'^t consensus pattern: [LIVM1-x(2)-G-[LM]-x(3HSTGAV]-x-[LIVMT1-x-^^ 

sI^S klwn to belong to this class detected by the pattern ALL. except for emrA and yjcR. 
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References: 
[2130] 

s [1] Gilson L.. Mahanty H.K.. Kolter R.. EMBO J. 9:3875-3884(1990). 

[2] Letoffe S., Delepelaire P., Wandersman C, EMBO J. 9:1375-1382(1990). 

[3] Stoddard G.W., Petzel J.P., van Belkum M J.. Kok J.. McKay L.L.. Appl. Environ. Microbiol. 58:1952-1961(1992). 
[4] Duong R, Lazdunski A., Cami B., Murgier M., Gene 121:47-54(1992). 
[5] Levels K., Trends Biochem. Sci. 19:119-123(1994). 

10 

[2131] 852. (IBR) 
In Between Ring fingers 

[2132] The IBR (In Between Ring fingers) domain is found to occur between pairs of ring fingers (zf-C3HC4). The 
function of this domain is unknown. This domain has also been called the C6HC domain and DRIL (for double RING 
15 finger linked) domain [2]. 

Number of members: 25 

[2133] 

20 

[1] Morett E, Bork P; Medline: 10366851 A novel transactivation domain in parkin. "Trends Biochem Sci 1999;24: 
229-231. 

[2] van der Reijden BA, Erpelinck-Verschueren CA, Lowenberg B, Jansen JH; Medline: 99349709 TRIADS: a new 
class of proteins with a novel cysteine-rich signature." Protein Sci 1999;8:1557-1561. 

25 

[2134] 853. (IPPT) 
IPP transferase 

[1] Durand JM, Bjork GR, Kuwae A, Yoshikawa M, Sasakawa C; Medline: 97440126 The modified nucleoside 
30 2-methylthio-N6-isopentenyladenosine in tRNA of Shigella flexneri is required for expression of virulence genes. 

"J Bacterid 1997:179:5777-5782. 

[2] Boguta M, Hunter LA, Shen WC. Gillman EC, Martin NC. Hopper AK; Medline: 94187700 Subcellular locations 
of MODS proteins: mapping of sequences sufficient for targeting to mitochondria and demonstration that mito- 
chondrial and nuclear isoforms commingle in the cytosol.** Mol Cell Biol 1994;14:2298-2306. 
35 [3] Gillman EC, Slusher LB, Martin NC. Hopper AK; Medline: 91203856 MODS translation initiation sites determine 

N6-isopentenyladenosine modification of mitochondrial and cytoplasmic tRNA.** Mol Cell Biol 1991;11:2382-2390. 

[2135] 854. (KE2) 
KE2 family protein 

40 [21 36] The function of members of this family is unknown, although they have been suggested to contain a DNA 
binding leucine zipper motif [2]. 

Number of members: 9 

45 [2137] 

[1] Ha H, Abe K, Artzt K; Medline: 92084131 Primary structure of the embryo-expressed gene KE2 from the mouse 
H-2K region." Gene 1991;107:345-346. 

[2] Shang HS. Wong SM. Tan HM, Wu M; Medline: 95129859 YKE2, a yeast nuclear gene encoding a protein 
50 showing homology to mouse KE2 and containing a putative leucine-zipper motif." Gene 1994;151:197-201. 

[2138] 855. (Lipoprotein_6) 

Prokaryotic membrane lipoprotein lipid attachment site 
[2139] Cross-reference(s) PS00013; PROKAR.LIPOPROTEIN 
55 In prokaryotes, membrane lipoproteins are synthesized with a precursor signal peptide, which is cleaved by a specific 
lipoprotein signal peptidase (signal peptidase II). The peptidase recognizes a conserved sequence and cuts upstream 
of a cysteine residue to which a glyceride-fatty acid lipid is attached [1]. Some of the proteins known to undergo such 
processing currently include (for recent listings see [1 ,2.3]): 
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Major outer membrane lipoprotein (mureln-lipoproteins) (gene Ipp). 

iinnnrntfiin-28 foene nlpA). 
Escherichia coli lipoprotein-34 (gene nIpB). 
Escherichia coli lipoprotein nlpC. 
Escherichia coli lipoprotein nIpD. 

Escherichia coli osmotically inducible lipoprotein B (gene osmB) 
Escherichia coli osmotically inducible lipoprotein E (gene osmE). 
Escherichia coli peptidoglycan^ssociated lipoprotein (gene pal). 
Escherichia coli rare lipoproteins A and B (genes rplA and rplB). 
Escherichia coli copper homeostasis protein cutF (or nIpE). 
Escherichia coli plasmids traT proteins. 
Escherichia coli Col plasmids lysis proteins. 
A number of Bacillus beta-lactamases. 

Bacillus subtilis periplasmic oligopeptide-binding protein (gene oppA). 
Borrelia burgdorferi outer surface proteins A and B (genes ospA and ospB) 
Borrelia hermsii variable major protein 21 (gene vmp21) and 7 (gene vmp7). 
Chlamydia trachomatis outer membrane protein 3 (gene omp3). 
Fibrobacter succinogenes endoglucanase cel-3. 
Haemophilus influenzae proteins Pal and Pep. 
Klebsiella pullulunase (gene pulA). 
Klebsiella pullulunase secretion protein puIS. 

Mycoplasma hyorhinis protein p37. „,r,ARr\ 
Mycoplasma hyorhinis variant surface antigens A. B. and C (genes vIpABC). 
Neisseria outer membrane protein H.8. 
Pseudomonas aeruginosa lipopeptide (gene IppL). 
Pseudomonas solanacearum endoglucanase egl. 

Rhodopseudomonas viridis reaction center cytochrome subunit (gene cytC). 
Rickettsia 17 Kd antigen. 

Shigella flexneri invasion plasmid proteins mxiJ and mxiM, 
Streptococcus pneumoniae oligopeptide transport protein A (gene amiA). 

- Treponema pallidium 34 Kd antigen. 

- Treponema pallidium membrane protein A (gene tmpA). 

- Vibrio harveyi chitobiase (gene chb). 

: is:rs°s^™"Xon» 141, — ™' ^ 

arehaebacterial protein known to be modified in such a fashion). 
[2140] From the precursor sequences of all these proteins, a consensus pattern and a set of rules to identify this 

C^sTnts" '''' 
site] Additional rules: 1) onH of thP seauence in consideration. 2) There must be at 

not membrane lipoproteins, but at least half of them could be. 

References 

[2143] 

ni Hayashi S Wu H.C., J. Bioenerg. Biomembr. 22:451-471(1990). 
[2] Klein P., Somorjai R.L.. Lau P.C.K.. Protein Eng. 2:15-20(1988). 

S rt^rt^^clia'^rKe^^^^^^^^^^^^ Oestert,e.t D.. Engelhard M. . Biol. Chem. 269:14939-14945 

(1994). 

[2144] 856. (Lipoprotein_7) 
Adhesin lipoprotein 
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[2145] This family consists of the pSO and variable adherence-associated antigen (Vaa) adhesins from Mycoplasma 
hominis. M. hominis is a mycoplasma associated with human urogenital diseases, pneumonia, and septic arthritis [1]. 
An adhesin is a cell surface molecule that mediates adhesion to other cells or to the surrounding surface or substrate. 
The Vaa antigen is a 50-kDa surface lipoprotein that has four tandem repetitive DNA sequences encoding a periodic 
5 peptide structure, and is highly immunogenic in the human host [1]. p50 is also a 50-kDa lipoprotein, having three 
repeats A.B and C, that may be a tetramer of 191-kDa in its native environment [2]. 

Number of members: 18 

10 [2146] 

[1 ] Zhang Q, Wise KS; Medline: 96294788 Molecular basis of size and antigenic variation of a Mycoplasma hominis 
adhesin encoded by divergent vaa genes. Infect Immun 1996;64:2737-2744. 

[2] Henrich B. Kitzerow A, Feldmann RC, Schaal H, Hadding U; Medline: 97047675 Repetitive elements of the 
15 Mycoplasma hominis adhesin p50 can be differentiated by monoclonal antibodies/' Infect Immun 1996;64: 

4027-4034. 

[2147] 857. (MaoC_like) 
MaoC like domain 

20 [2148] The MaoC protein is found to share similarity with a wide variety of enzymes; estradiol 1 7 beta-dehydrogenase 
4, peroxisomal hydratase-dehydrogenase-epimerase, fatty acid synthase beta subunit. All these enzymes contain other 
domains. This domain is also present in the NodN nodulation protein N. No specific function has been assigned to this 
region of any of these proteins. The maoC gene is part of a operon with maoA which is involved in the synthesis of 
monoamine oxidase [1]. 

25 

Number of members: 46 

[2149] [1] Sugino H, Sasaki M, Azakami H, Yamashita M, Murooka Y Medline: 96235221 A monoamine-regulated 
Klebsiella aerogenes operon containing the monoamine oxidase structural gene (maoA) and the maoC gene " J Bac- 
30 teriol 1992;174:2485-2492. 
[2150] 858. (MSP) 

Manganese-stabilizing protein / photosystem II polypeptide 

[2151] This family consists of the 33 KDa photosystem II polypeptide from the oxygen evolving complex (OEC) of 
plants and cyanobacteria. The protein is also known as the manganese-stabilizing protein as it is associated with the 
35 manganese complex of the OEC and may provide the ligands for the complex [1]. 

Number of members: 17 

[2152] [1] Philbrick JB, Zilinskas BA; Medline: 88334494 "Cloning, nucleotide sequence and mutational analysis of 
40 the gene encoding the Photosystem II manganese-stabilizing polypeptide of Synechocystis 6803." Mol Gen Genet 
1988;212:418-425. 
[2153] 859. (NAC) 

[2154] [1] Makarova KS, Aravind L, Galperin MY. Grishin NV. Tatusov RL, Wolf Yl. Koonin EV; Medline: 99342100 
Comparative genomics of the Archaea (Euryarchaeota): evolution of conserved protein families, the stable core, and 
45 the variable shell." Genome Res 1999;9:608-628. 

Number of members: 27 

[2155] 860. (Nop) 

so Putative snoRNA binding domain 

[21 56] This family consists of various Pre RNA processing ribonucleoproteins. The function of the aligned region is 
unknown however it may be a common RNA or snoRNA or Nopip binding domain. Nop5p (Nop58p) Swiss:Q12499 
from yeast is the protein component of a ribonucleoprotein protein required for pre-18s rRNA processing and is sug- 
gested to function with Nopip in a snoRNA complex [1]. NopSBp Swiss:000567 and Nop5p interact with Nopip and 

55 are required for ribosome biogenesis [2]. Prp31p Swiss:p49704 is required for pre-mRNA splicing in S. cerevisiae [3]. 
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Number of members: 23 



[2157] 

!„i«ac. with NOP.P »nd re<„ar«) (o, rbo»™ °"««?«'„ ^ PRP31 ge.. encodes 

1164-1170. 
[2158] 861.(Nramp) 

Natural resistance-associated macrophage protein consists of Nrampi , Nramp2. and yeast pro- 

The natural resistance-associated macrophage P^^^f " defined ty a conserved hy- 

teins Smf1 and Smf2. The NRAMP fam.ly .s a "^^^'^f J^^^J^^^^^^^^^^^^ are divalent cation transporters, 

drophobic core of ten transmembrane domains [5] This f '"''y "^^"^f ^ ^stem and is recruited to the 

Nrampi is an integral membrane P-««'" -P-J^^^^^^^^^^^ Nrampi may regulate 

membrane of a phagosome upon phagocytosis 1^1- ^y^"^"'"^^^^^^ aenetically predispose an individual to sus- 

yeast proteins Smf1 and Smf2 may also transport divalent cations [3]. 

Number of members: 36 
[2159] 

HI Govoni G Gros P; Medline: 98383996 Macrophage NRAMPI and its role in resistance to mic«,bial infections. 
;iC:oJ Da K^^^^^^^^ 9S294035 Metal ion homeostasis and intracellular parasHism." Mo. Microbiol 

1998;28:403-412. Medline" 98030569 Functional complementation of the yeast 

iSS^^nlLn. p™»«..- P^c N.. Acd Sd U S A ,99582:10089-10093. 
[2160] 862. (NTP_transf_2) 

Number of members: 83 

[2161] m HoIm L. Sander C; Medline: 96005605 DNA polymerase beta belor^gs to ar, ancient nucleotidyltransferase 
superfamily." Trends Biochem Sci 1995;20:345-347. 
[2162] 863. (Paramyxo.P) 

Paramyxovirus P phosphoprotein op^ndai virus and human and bovine parainflu- 

[21 63] This family consists of paramyxovirus P P^°^P^,°P^°;^^^^^^^ form the P and L proteins 
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Number of members: 15 
[2164] 

5 [1] Bowman MC. Smallwood S. Moyer SA; Medline: 99329169 Dissection of Individual Functions of the Sendai 

Virus Phosphoprotein in Transcription." J Virol 1999;73:6474-6483. 

[2] Matsuoka Y. Curran J, Relet T, Kolakofsl<y D. Ray R, Compans RW; Medline: 91237868 The P gene of human 
parainfluenza virus type 1 encodes P and C proteins but not a cysteine-rich V protein." J Virol 1991 ;65:3406-3410. 

10 [2165] 864. (Patatin) 

[2166] This family consists of various patatin glycoproteins from plants. The patatin protein accounts for up to 40% 
of the total soluble protein in potato tubers [2]. Patatin is a storage protein but it also has the enzymatic activity of lipid 
acyl hydrolase, catalysing the cleavage of fatty acids from membrane lipids [2]. 

15 Number of members: 21 

[2167] 

[1] Banfafvi Z, Kostyal Z, Barta E; Medline: 95107249 Solanum brevidens possesses a non-sucrose-inducible 
20 patatin gene." Mol Gen Genet 1994;245:517-522. 

[2] Mtgnery GA, Pikaard CS, Park WD; Medline: 88226014 Molecular characterization of the patatin multigene 
family of potato." Gene 1988;62:27-44. 

[2168] 865. (Pentapeptide_2) 
25 Pentapeptide repeats (8 copies) 

[21 69] These repeats are found in many mycobacterial proteins. These repeats are most common in the PPE family 
of proteins, where they are found in the MPTR subfamily of PPE proteins. The function of these repeats is unknown. 
The repeat can be approximately described as XNXGX, where X can be any amino acid. These repeats are similar to 
Pentapeptide [1], however it is not clear if these two families are structurally related. 

30 

Number of members: 362 
[2170] 

35 [1] Bateman A, Murzin A. Teichmann SA; Medline: 98318059 Structure and distribution of pentapeptide repeats 

in bacteria." Protein Sci 1998;7:1477-1480. 

[2] Cole ST. Brosch R, Parkhi!) J. Gamier T, Churcher C, Harris D, Gordon SV, Eiglmeier K, Gas S, Barry CE 3rd, 
Tekaia F, Badcock K, Basham D, Brown D, Chillingworth T, Connor R, Davies R, Devlin K, Feltwell T, Gentles S, 
Hamlin N. Holroyd S. Hornsby T, Jagels K, Barrell BG; Medline: 98295987 Deciphering the biology of Mycobac- 
^0 terium tuberculosis from the complete genome sequence." Nature 1998;393:537-544. 

[2171] 866. (Peptidase_C13) 
Peptidase CI 3 family 

This family of peptidases is known as the hemoglobinase family because it contains a globin degrading enzyme from 
45 blood parasites Swiss:P42665. However relatives are found in plants and other organisms that have other functions. 
Members of this family are asparaginyl peptidases [1]. 

Number of members: 26 

50 [2172] [1] Chen JM. Dando PM, Rawlings ND. Brown MA, Young NE, Stevens RA, Hewitt E. Watts C. Ban-ett AJ; 
Medline: 97218252 Cloning, isolation, and characterization of mammalian legumain, an asparaginyl endopeptidase." 
J Biol Chem 1997;272:8090-8098. 
[2173] 867. (Pro_dh) 
Proline dehydrogenase 

55 

Number of members: 25 

[2174] [1] Ling M, Allen SW, Wood JM; Medline: 95055736 Sequence analysis identifies the proline dehydrogenase 
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and delta 1- pyrroIine-S-carboxylate dehydrogenase domains of the multifunctional Escherichia coli PutA protein " J 
KM^i d:^i Hooyi ■2-^,3:950-956. 

[2I tI! ?h?s lamSrconsists of the 23 kDa subunit of oxygen evolving system of photosystenr. II or PsbP from various 
plante (Where It is encoded by the nuclear genome) and Cyanobacteria. The 23 KDa PsbP protein is required for PSll 
to fut operaJional in vivo, it increases the affinity of the water oxidation site for CI- and provides the conditions 
required for high affinity binding of Ca2+ [2]. 

Number of members: 25 

PITTI 

[1] Rova EM. Mc Ewen B. Fredriksson PO. Styring S; Medline: 97067138 Photoactivatton /^-^f 
competing in a mutant of Chlamydomonas reinhardtii lacking the 23-kDa extrinsic subunit of photosystem II. J 

K^hter A KhuiS^JR Tyagi AK; Medline: 97191538 Nucleotide sequence of the psbP gene encoding pre- 
cursor of 23-kDa polypeptide of oxygen-evolving complex In Arabldopsis thaliana and Its expression in the wild- 
type and a constitutively photomorphogenic mutant." DNA Res 1996;3:277-285. 

I2I7?] ^he PuTdomain named after PseudoUridine synthase and Archaeosine transglycosylase. was detected in 
archaeal and eukaryotic pseudouridlne synthases, archaeal archaeosine synthases, a family of predicted ATPases 
that may be t'lved in RNA modification, a family of predicted archaeal and bacterial rRNA 7t'^y'f 
thePUAdomainwasdetectedlnafamilyofeukaryoticproteinsthatalsocontalnadomamhomoo^^^^^^^^ 

26 initiation factor elFI/SUII ; these proteins may comprise a novel type of ^^^"^'^^■^"J^^^l^.^^^^'^^ff 'y;^ 

domain was detected also in bacterial and yeast glutamate kinases; this is compatible with the demonstrated role of 
fhe'e ei^me! In the regulation of the expression of other genes [1]. It is predicted that the PUA domain ,s an RNA 
binding domain. 

30 Number of members: 48 

[21801 [1] Aravind L. Koonin EV; Medline: 99193178 Novel predicted RNA-binding domains associated with the trans- 
lation machinery." J Mol Evol 1999;48:291-302. 
[21811 870. (RF1) 

pTsir Membl" of this family are peptide chain release factors. The eukaryotic Release Factor 1 P-t^ns (eRFI^^^ 
are involved in temiination of franslation. The eRFI protein is functional for all stop codons and appears to abolish 
read thrS of these codons. This family also Includes other proteins for which the precise molecular function is 
TknowTMany of them are from Archaebacteria. These proteins may also be involved in translation temiination but 
40 this awaits experimental verification. 

Number of members: 25 

[21831 



45 



50 



ni Frolova L. Le Goff X, Rasmussen HH. Cheperegin S. Drugeon G. Kress M. Amnan I. Haenni AL Celis JE, 
Philippe M. at al; Medline: 95082951 A highly conserved eukaryotic protein family possessing properties of polypep- 
tide chain release factor" [see comments] Nature 1994;372:701-703. M^Hiin«. Q7'^1*5'^14 Eukarv- 
[21 Druqeon G Jean-Jean O, Frolova L. Le Goff X, Philippe M. Kisselev L. Haennt AL; Medline. 97315314 Eukary- 
Sc release factor 1 (eRFI) abolishes readthrough and competes with suppressor tRNAs at all three term.nat.on 
codons in messenger RNA." Nucleic Acids Res 1997;25:2254-2258. 



[21 84] 871 . (RibosomaLLI 4e) 
Ribosomal protein LI 4 

[21851 This family includes the eukaryotic ribosomal protein L14. 
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Number of members: 15 

[21 86] 872. (RibosomaLS27) 
Ribosomal protein S27a 

5 [2187] This family of ribosomal proteins consists mainly of the 40S ribosomal protein S27a which is synthesized as 
a C-terminal extension of ubiquitin (CEP). The S27a domain compromises the C-terminal half of the protein. The 
synthesis of ribosomal proteins as extensions of ubiquitin promotes their incorporation into nascent ribosomes by a 
transient metabolic stabilization and is required for efficient ribosome biogenesis [3]. The ribosomal extension protein 
S27a contains a basic region that is proposed to form a zinc finger; its fusion gene is proposed as a mechanism to 

10 maintain a fixed ratio between ubiquitin necessary for degrading proteins and ribosomes a source of proteins [2]. 

Number of members: 36 

[2188] 873. (Sperm ine_synth) 
15 Spermine/spermidine synthase 

[2189] Spermine and spermidine are polyamines. This family includes spermidine synthase that catalyses the fifth 
(last) step in the biosynthesis of spermidine from arginine, and spermine synthase. 

Number of members: 39 

20 

[2190] 

[1] Mezquita J, Pau M. Mezquita C; Medline: 97449308 Characterization and expression of two chicken cDNAs 
encoding ubiquitin fused to ribosomal proteins of 52 and 80 amino acids." Gene 1997:195:313-319. 
25 [2] Redman KL, Rechsteiner M; Medline: 89181932 Identification of the long ubiquitin extension as ribosomal 

protei- S27a." Nature 1989;338:438-440. 

[3] Finley D, Bartel B, Varshavsky A; Medline: 89181925 The tails of ubiquitin precursors are ribosomal proteins 
whose fusion to ubiquitin facilitates ribosome biogenesis." Nature 1989;338:394-401. 

30 [2191] 874. (Surp)Surp module 

[1] Denhez F, Lafyatis R; Medline: 94266805 Conservation of regulated altemative splicing and identification of func- 
tional domains in vertebrate homologs to the DrosophiJa splicing regulator, suppressor-of-white-apricot." J Biol Chem 
1994:269:16170-16179. 

[2192] This domain is also known as the SWAP domain. SWAP stands for Suppressor-of-White-APricot. It has been 
35 suggested that these domains may be RNA binding [1]. 

Number of members: 32 

[2193] 875. (TFIIE)TFIIE alpha subunit 
40 The general transcription factor TFIIE has an essential role in eukaryotic transcription initiation together with RNA 
polymerase II and other general factors. Human TFIIE consists of two subunits TFIIE-alpha Swiss:P29083 and TFIIE- 
beta Swiss:P29084 and joins the preinitiation complex after RNA polymerase II and TFIIF [1]. This family consists of 
the conserved amino terminal region of eukaryotic TFIIE-alpha [2] and proteins from archaebacteria that are presumed 
to be TFIIE-alpha subunits also Swiss:029501 [3]. 

45 

Number of members: 12 
[2194] 

50 [1] Ohkuma Y, Sumimoto H. Hoffmann A. Shimasaki S, Horikoshi M, Roeder RG; Medline: 92065982 Structural 

motifs and potential sigma homologies in the large subunit of human general transcription factor TFIIE.** Nature 
1991;354:398-401. 

[2] Ohkuma Y, Hashimoto S, Roeder RG, Horikoshi M; Medline: 93087200 Identification of two large subdomains 
in TFIIE-alpha on the basis of homology between Xenopus and human sequences. Nucleic Acids Res 1992;20: 
55 5838-5838. 

[3] Klenk HP, Clayton RA. Tomb JF, White O. Nelson KE, Ketchum KA, Dodson RJ. Gwinn M, Hickey EK. Peterson 
JD. Richardson DL, Keriavage AR, Graham DE. Kyrpides NO, Fleischmann RD. Quackenbush J. Lee NH. Sutton 
GG. Gill S. Kirkness EF, Dougherty BA, McKenney K. Adams MD. Loftus B, Venter JC, et al; Medline: 98049343 
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The complete genome sequence of the hyperthermophilic. sulphate- reducing archaeon Archaeoglobus fulgldus. 



[2195] 876. (Transglut_core) 

r21961 Cross-reference(s) PS00547; TRANSGLUTAMINASES 

fraSluteSses (EC 2 3 2.1 3) (TGase) [1 ,2] are calciumKiependent enzymes that catalyze th^ 

Sp^orornrthefomiationofisopeptidebondsbe^^^^ 

SechaSth^^^^^^^ 

of^tamSesto proteins. The beLnowntransglute^^^^^ 

a,mS of t^o catalytic A subunits and two non^tatytic B subunits. Factor XIII is responsible for cross-l.nk.ng fibnn 
S thuLtabilizing the fibrin clot. CMher forms of transglutaminases are widely distributed m vano^ 

and body fluids. Sequence data is available for the following fomfis of TGase: 

- Transglutaminase K (Tgase K), a membrane-bound enzyme found in mammalian epidemiis and important for the 
formation ofthecornified cell envelope (gene TGM1). . ^ ^ , , ^„„x«-mo\ 

- Tissue transglutaminase (TGase C). a monomeric ubiquitous enzyme located .n the ^>*'P^^";<9f "^JJ^ 

. Transglutaminase 3, responsible for the later stages of cell envelope formation in the epidermis and the hair follicle 
(gene TGM3). 

Transglutaminase 4 (gene TGM4). 

r2197l A conserved cysteine is known to be involved in the catalytic mechanism of TGases. The erythrocyte mem- 
brane band 4.2 protein, which probably plays an Important role in regulating the shape of erythrocytes and their m^ 
chanical properties, is evolutionary related to TGases. However the active site cysteine is substituted by an alanine 
and the 4 2 Drotein does not show TGase activity. _ ^ ^ ^ • *i— 

pTsSl CoSSnsus pattern:[GTl-Q-tCA]-W-V-x-[SAHGAH.VTl-x(2)-T-x-[LMSC]-R-[CSA>^^^^^ [The flrs^ C is the 
ad^e site residue] Sequences known to belong to this dass detected by the patternALL. Other sequence(s) detected 

i2?9'SI]'^1t]'?c°hLosrA:, Bottenus R.E., Davie E.W. J. Biol. Chem. 265:1 3411-13414(1990). [ 2] Greenberg C.S.. Birck- 
bichler P.J., Rice R.H. FASEB J. 5:3071-3077(1991). 

[2200] 877. (TruB N)TruB family pseudouridylate synthase (N tenninal domain) ■ „ „, 

£em2ers of this family are involved in modifying bases in RNA molecules. They carry out the 55 to 

ba^s to pseudouridine. This family includes TruB. a pseudouridylate synthase that specrfically converts uraal 55 to 
pseudouridine in most tRNAs. This family also includes CbKp that modifies rRNA [2]. 

Number of members: 33 

[2201] 

[1] Nurse K, Wrzesinski J, Bakin A. Lane BG, Ofengand J; Medline: 96079944 Puriflcation. ctoning. and properties 
of the tRNA psi 55 synthase from Escherichia coll." RNA 1 995;1 : 1 02-1 1 2^ MoHiino- qri <19521 The box 

[2] Lafontaine DU. Bousquet-Antonelli C. Henry Y. Caizergues-Ferrer M, Tollervey D; Medl ne- 981 39521 The box 
H + ACA snoRNAs carry CbfSp. the putative rRNA pseudouridine synthase." Genes Dev 1998.12.527-537. 

r22021 878 (UDPGP)UTP-glucose-l-phosphateuridylyltransferase 
Smirconsists of UTP-g'ucose-1-phosphate uridylyltransferases. EC:2^7.9 Also 

rophosphorylase (UDPGP) and Glucose-1-phosphate uridylyltransferase. UTP-glucos^l-phospha^te undylyl tra^^^ 
erase cata7yses the interconversion of MgUTP . glucose-1 -phosphate and UDP-glucose + MgPP. [1]. UDP-glu«.se 
ranfmportantlntemiediateinmammalian carbohydrate interconversion involved invanous metabolic rol^ 

on tissue type [1]. In Dictyostelium (slime mold) mutants in this enzyme abort the development cycle [2]. Also wrthin 
°he family is UDP-N-acetylglucosamine Swiss:Q16222 or AGX1 [3] and two hypothetical proteins from Borrelia burg- 
dorferi the lyme disease spirochaete Swiss:051893 and Swiss:051036. 

Number of members: 18 

[2203] 

[1] Duggleby RG, Chao YC, Huang JG. Peng HL. Chang HY; Medline: 96202932 Sequence ^Werences te^ee^^^ 
human muscle and liver cDNAs for UDPglucose pyrophosphoiylase and kinetic properties of the recombinant 
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enzymes expressed in Escherichia coll." Eur J Biochem 1996;235:173-179. 

[2] Ragheb JA. Dottin RP; Medline: 87231075 Structure and sequence of a UDP glucose pyrophosphorylase gene 
of Dictyostelium discoideum.** Nucleic Adds Res 1987;15:3891-3906. 

[3] Mio T. Yabe T. Arisawa M. Yamada-Okabe H; Medline: 98269105 The eukaryotic UDP-N-acetylglucosamine 
5 pyrophosphoryiases. Gene cloning, protein expression, and catalytic mechanism. J Biol Chem 1998;273: 

14392-14397, 

[2204] 879. (UPF004) Uncharacterized protein family UPF0044 signatureCross-reference(s) PS01301: UPF0044 
The following uncharacterized proteins have been shown [1] to be highiysimilar: - Bacillus subtilis hypothetical protein 
10 yqel. 

Escherichia coll hypothetical protein yhbV and H II 333, the corresponding Haemophilus influenzae protein. 
Methanococcus jannaschii hypothetical protein MJ0652. 

15 These are small proteins of 10 to 1 5 Kd. They can be picked up in the database by the following pattern. This pattern 
is located in the N>terminal part of these proteins. 

[2205] Consensus pattern: L-[ST|-x(3)-K-x(3)-[KR]-[SGA]-x-[GA)-H-x-L-x-P-[LIVl-x(2)- [LIV]-[GA]-x(2)-G Sequenc- 
es known to belong to this class detected by the pattemALL. Other sequence(s) detected in SWISS-PROTNONE. 
[2206] 880. (zf-A20)A20-like zinc finger A20- (an inhibitor of cell death)-like zinc fingers. The zincfinger mediates self- 
20 association in A20. These fingers alsomediate IL-1 -induced NF-kappa B activation. 

Number of members: 22 

[2207] 

25 

[1] Heyninck K. Beyaert R; Medline: 99126071 The cytokine-inducible zinc finger protein A20 inhibits IL-1 -induced 
NF- kappaB activation at the level of TRAF6. FEBS Lett 1999;442:147-150. 

[2] De Vaick D, Heyninck K. Van Criekinge W. Contreras R,Beyaert R, Fiers W; Medline: 96390831 A20. an inhibitor 
of cell death, self-associates by its zinc finger domain." FEBS Lett 1996;384:61-64. 
30 [3] Song HY, Rothe M, Goeddel DV; Medline: 96270609 The tumor necrosis factor-inducible zinc finger protein 

A20 Interacts with TRAF1/TRAF2 and inhibits NF-kappaB activation. Proc Natl Acad Sci U S A 1 996;93:6721 -6725. 
[4] Opipari AW Jr, Boguski MS. Dixit VM; Medline: 90368626 The A20 cDNA induced by tumor necrosis factor 
alpha encodes a novel type of zinc finger protein." J Biol Chem 1990;265:14705-14708. 

35 [2208] 881.(zf-PARP) 

Poly(ADP-ribose) polymerase zinc finger domain 

Cross-reference(s) PS00347; PARP_ZN_FINGER^1 PS50064: PARP_ZN_FINGER_2 

[2209] Poly(ADP-ribose) polymerase (EC 2.4.2.30) (PARP) [1 ,2] is a eukaryotic enzyme that catalyzes the covalent 
attachment of ADP-ribose units from NAD(+) to various nuclear acceptor proteins. This post-translational modification 

<o of nuclear proteins is dependent on DNA. It appears to be involved in the regulation of various important cellular proc- 
esses such as differentiation, proliferation and tumor transformation as well as in the regulation of the molecular events 
involved in the recovery of the cell from DNA damage. Structurally, PARP, about 1000 amino-acids residues long, 
consists of three distinct domains: an N-terminal zinc-dependent DNA-binding domain, a central automodification do- 
main and a C-terminal NAD-binding domain. The DNA-binding region contains a pair of zinc finger domains which 

45 have been shown to bind DNA in a zinc-dependent manner. The zinc finger domains of PARP seem to bind specifically 
to single-stranded DNA. DNA ligase III [3] contains, in its N-terminal section, a single copy of a zinc finger highly similar 
to those of PARP. 

[2210] Consensus pattem: C-[KR]-x-C-x(3)-l-x.K.x(3)-IRG]-x(16.1 8)-W-[FYH]-H-x{2)-C [The three C's and the H are 
zinc ligands] Sequences known to belong to this class detected by the patternALL. Other sequence(s) detected in 
50 SWISS-PROTNONE. Sequences known to belong to this class detected by the profile ALL. Other sequence(s) detected 
in SWISS-PROTNONE. 

[2211] Note: This documentafion entry is linked to both signature patterns and a profile. As the profile is much more 
sensitive than the patterns, you should use it if you have access to the necessary software tools to do so. 

55 1 1] Althaus F.R., Richter C.R. Mol. Biol. Biochem. Biophys. 37:1-126(1987). 

[ 2] de Murcia G.. Menissier de Murcia J. Trends Biochem. Sci. 19:172-176(1994). 

[ 3] Wei Y.-F.. Robins P.. Carter K.. Caldecott K.. Pappin D.J.C.. Yu G.-L., Wang R.-R. Shell B.K., Nash R.A., Schar 
P., Barnes D.E., Haseltine W.A.. Lindahl T. Mol. Cell. Biol. 15:3206-3216(1995). 
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A. Asparaginase 2 

fixation and/or nitrogen metabolism in plants. 

[2213] Ref: Bon et al. (1997) AppI Biochem Biotechnol 63-65: 203-1^ 

B. Chloroa t)-bind 

increase the rate of photosynthesis. 

Ref: LeutwHer et al. (1986) Nucleic Acids Res 14: 4051-64 Brandt et al. (1992) Plant Mol Biol 19: 699-703 

C. DMRL synthase 

Ref. Garcia-Ramirez et al. (1995) J Biol Chetn 270: 23801-7 

praduction of plants that are resistant to infection by double stranded DNA viruses. 

Ref Yang et al. (1993) PNAS USA 90: 5086-90 

Ustav and Stenlund (1991) EMBO J 10: 449-57 
Callaway et al. (1996) Mol Plant Microbe Interact 9: 810-8 

E. EF1_G 

Ref. Klnzy et al. (1994) NAR 22: 2703-7 Dunn et al. (1993) Plant Mol Biol 23: 221-5 Aguilar et al. (1991) Plant Mol 
Biol 17: 351-60 

F. ENV polvprotein 

12218] This family comprises the envelope or coat P^teins Kno,jm Jroj. a^ 'pfarl^SS are 

Llian species, retroviruses are responsible for diseases such^as te^^^^^ 
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retroviruses, in essence generating a new method for mutant production, gene tagging and the like. 

Ref: Mamoun et al (1990) J Virol 64: 41 80-8 Grandbastien et al. (1 989) Nature 337: 376-80 Wright and Voytas (1 998) 
Genetics 149: 703-15 

5 

G. GlycosyLhydr9 

[2219] Proteins having this domain (previously known as the glycosyl hydrolase family 5 domain) catalyze the en- 
dohydrolysis of 1 ,4-p-D>glucosidic linkages in cellulose. Numerous plant proteins with this domain exist and are ex- 
10 pressed in an organ specific manner. They are involved in the fruit ripening process, in cell elongation and plant re- 
production. Modulation of the activity of these proteins, either by over- or under-expression or by mutation of the 
polypeptide, could be used to affect post-harvest physiology (e.g. rate of ripening) or for engineering reproductive 
sterility. 

15 Ref: Giorda et al. (1990) Biochemistry 29: 7264-9 Tucker et al. (1988) Plant Physiol 88:1257-62 Shani et al. (1997) 
43: 837-42 Milligan and Gasser (1995) Plant Mol Biol 28: 691-711 

H. GlycosyLhydr14 

20 [2220] The P-amylases (family 14 of glycosyl hydrolases) catalyze the hydrolysis of 1 ,4-a-glucosidic linkages in 
polysaccharides and remove successive maltose units from the non-reducing ends of the chains. Mutants of p-amylase 
in Arabidopsis exhibited altered degradation of starch throughout the diurnal cycle. In addition, the mutant phenotypes 
indicated that these enzymes not only affect carbohydrate metabolism/catabolism, but also influence the amount of 
pigment stored within particular cells. Manipulation of the ^-amylase genes enables control of plant pigmentation (for 

25 example, fibre pigment in cotton) as well as carbohydrate synthesis and degradation. 

Ref: Zeeman et al. (1998) Plant J 15: 357-65 Hirano and Nakamura (1997) Plant Physiol 114: 5675-82 Kitamoto et 
al. (1988) J Bacteriol 170: 5848-54 

30 1. Glycosyl_hydr15 

[2221] Glycosyl hydrolases from family 15 (such as 1 ,4-Alpha-D-Glucan glucohydrolase.) catalyze the hydrolysis of 
terminal 1 ,4-linked alpha-D-glucose residues successively from the non-reducing ends of the chains resulting in the 
release of p-D-Glucose. In plants these proteins have been tied to the mobilization of the xyloglucan stored in the 
35 cotyledonary cell walls. Proteins such as these could be varied to affect the rate of plant growth (for example during 
germination), storage and/or use of glucose and other sugars by plant tissues and alteration of the properties, such 
as elasticity, of plant cell walls. 

Ref: Crombie et al. (1998) Plant J 15: 27-38 Hata et al. (1991) Agric Biol Chem 55: 941-9 

40 

J. GlycosyLhydr20 

[2222] Members of the family 20 glycosyl hydrolases catalyze the hydrolysis of terminal non-reducing N-acetly-D- 
hexosamine residues in N-acetyl-p-D-hexosaminides. N-acetyl-p-glucosaminidase belongs to this family and exists in 

45 several different forms (consisting of various combinations of alpha and beta chains) depending on the organism. 
Family 20 glycosyl hydrolases have been implicated in lysosomal storage diseases (such as Sandhoff disease) and 
glycogen storage disease in humans. These types of proteins are also responsible for the hydrolysis of chltin. In plants, 
these proteins could be useful in controlling carbohydrate catabolism. thereby influencing the amount of sugars avail- 
able for storage and/or use in other metabolic pathways. In addition, it is possible that such proteins could be used to 

50 engineer an endogenous Insect protection mechanism, e.g. by secretion of a chitin-hydrolyzing composition by the 
plant. 

Ref: Graham et al (1988) J Biol Chem 263: 16823-9 O'Dowd et al. (1988) Biochemistry 27: 5216-26 
55 K. HMG box 

[2223] The HMG box is a novel type of DNA-binding domain found in a diverse group of proteins. Numerous plant 
proteins contain this domain, such as the HMG1/2-like proteins. The expression of some of these HMG proteins appears 
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« 1. (19931 Nudelc Acid. 21: 2493^1 Zh«„ e. ... (1993) Pl« Mo. Bio, 23: 613-23 « 

al. (1993) Plant Mol Biol 23: 619-25 

L. 1L2 

acfuclaltorpraperrMuBllon.naiuncl^niogottommu^'^P^^ 
to IL-2 in mammals. 

Ref: Heike et al. (1997) Scand J Imniuno. 45: 221-6 Ariel et a.. (1998) J lmn,uno. 161: 2465-72 Schink (1997) An- 
ticancer Drugs 8 Supp1 1: S47-51 

M. Oxidored^FMN 
tissues. 

Ref Baker et a.. (1998) Proc Soc Exp Bio. Med 217: 317-21 Soha.ier and Weiter (1997) J Biol Chem 272: 28066-72 
Mandani et al. (1994) PNAS USA 91: 922-6 

N. Oxidored_q2 

R.t Bor^w. « * (1998) EMBO J 17: 363-76 « .1 (1996, M« G.n 258: 166-73 M.K, «... (1995, J 

Mol Biol 251:614-28 

O. PABP 

^227] Po.yadeny.ate binding proteins .ind the po.y (A) tail ^ rnRNA„ ^.rXTSS?^^^^^ 

numerous PABP genes that are expressed m an ^'^^^^-^P^^'^^ZTtI^P^Po^^V^^^^ in numerous 

Ret Hiteon et al (1993) Plant Physiol 103: 525-33 Belostotsky and Meagher (1993) PNAS USA 90: 6686-90 

P. Parvo coat 
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viruses. The coat proteins in these plant viruses are critical to the virus life cycle within the plant. For example, the coat 
protein of MSV is thought to be involved in intra- and inter-cellular movement within the plant. Engineering of proteins 
having similarity to parvoviral coat proteins, especially to produce proteins that interfere with maturation of the virus 
particle, enables the production of plants having better resistance to natural plant single-stranded DNA viruses. 

5 

Ref: Liu et al (1997) J Gen Virol 78: 1265-70 Rohde et al. (1990) Virology 176: 648-51 
Q. Pkinase_C 

10 [2229] Plant serine/threonine protein kinases possessing this domain are expressed in all tissues and are known to 
undergo serine-specific autophosphorylation and specifically phosphorylate two ribosomal proteins. P14 and PI 6. Dur- 
ing development, these proteins predominate during high metalx)lic activity in growing buds, root tips, leaf margins 
and germinating seeds. They are thought to be involved in the control of plant growth and development. In addition, 
two genes encoding proteins from this family have been described that help plant celts adapt during cold or high salt 

15 stresses. Consequently, engineering Pkinase C proteins provides a way to control general growth/development of the 
plant as well as a means to provide endogenous protection against environmental stresses. 

Ref: Zhang et al. (1994) J Biol Chem 269: 17586-92 Mizoguchi et al. (1995) FEBS Lett 358: 199-204 

20 R. REV 

[2230] The REV proteins act post-transcriptionally to relieve negative repression of GAG and ENV production in 
retroviruses such as Human Immounodeficiency Virus type I (HIV-1). Plants contain retrovirus-like viruses such as 
pararetroviruses and retrotransposons (i.e. transposons having long terminal repeats). Plant retrotransposons in par- 
25 ticular have been used to create mutations at various loci, thereby permitting gene isolation, gene tagging and the like. 
Manipulation of plant REV proteins enables control of transposition frequencies of corresponding transposable ele- 
ments and provides a new tool for genetic engineering of plants. 

Ref: Sodroski et al. (1986) Nature 321: 412-7 Franchini et al. (1989) PNAS USA 86: 2433-7 Marquet et al. (1995) 
30 77: 113-24 Grandbastien et al. (1989) Nature 337: 376-80 Wright and Voytas (1998) Genetics 149: 703-15 

S. RuBisCo small 

[2231] Ribulose 1.5-bisphosphate carboxylase/oxygenase (RuBisCo) catalyzes the initial step in the C3 photosyn- 
35 thetic carbon reduction cycle, adding cartx)n dioxide to D-ribulose 1 .5-bisphosphate to form two molecules of 3-phos- 
pho-D-glycerate. RuBisCo is comprised of two subunits, one large which is synthesized in the chloroplast, and one 
small which is synthesized in the cytoplasm and then transported in to the chloroplast. The expression of the small 
subunit of RuBisCo is light regulated. Manipulation of these proteins could increase the elficiency of photosynthesis 
or allow alterations in developmental timing. 

40 

Ref: Giuliano et al. (1988) PNAS USA 85: 7089-93 Dedonder et al. (1993) Plant Physiol 101: 801-8 
T. Sialyltransf 

45 [2232] Members of the CMP-N-acetylneuraminate-P-galactosamide-a-2.3-sialyltransferase family catalyze the fol- 
lowing reaction: 

CMP-N-acetylneuraminate + p-D-galactosyl-1,3-N-acetyl-a-D-galactosaminyl-R = CMP + a-N-acetylneraminyl-2,3-p- 
D-galactosyl-1 .3-N-acetyl-alpha-P-galactosaminyl-R. These proteins are though to be responsible for the synthesis of 
the sequence neurac-a-2.3-gal-p-1 .3-galnac- found on sugar chains )-linked to threonine or serine and also as a ter- 
50 minal sequence on certain gangliosides in mammalian cells. In plants, glycosyttransferases in the Golgi apparatus 
synthesize cell wall polysaccharides and elaborate the complex glycans of glycoproteins. Engineering of plant sialyl- 
transferases allows targeting of proteins to particular cellular locations or enables the making of changes in cell wall 
structure. 

55 Ref: Wee et al. (1998) Plant Cell 10: 1759-68 Lee et al. (1994) J Biol Chem 269: 10028-33 Kitagawa and Paulson 
(1994) J Biol Chem 269: 1394-401 
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U. Signal 

P233, Manyplantproteinsinthisfamilycontainsequencessimi^^^^^^^^ 

otic family of signal transducers known as the two^omponent ^y^t^^^]!''^^"^^^^^^ One family member in Arab!- 
nutrient availability, etc. 

Ret: Chang et al. (1993) Science 262: 539^ Nagaya et a.. (1993) Gene 131: 119-124 Gottfert et al. (1990) PNAS 
USA 87: 2680-4 

V. vMSA 

thus enabling targeting of the virus to particular species and/or tissues of plants. 

Ref: Okaa,otoetal.(1988)JGenV.rol69:2575-83Grandbastieneta..(1989)Nature337:376-80VWightandVoytas 

(1998) Genetics 149: 703-15 
W. zf-CCCH 

I.a3Sl Thisfan,ilyofproteinsisde«nedbyha.ng^^^^^^ 

rs-j;Snra:rh=^^^ 
rnrsfto^rr^rm^^^^^^^^^^^^ 

?egItors such as COP1 . enables manipulation of son,e signal transduction pathways. 
Ref: Pang et al. (1993) Nucleic Acids Res 21: 1647-53 Deng et al. (1992) Cell 71: 791-801 
X. zf-RanBP 

12236] Proteins falling within this category contain many X-X-F-G and >'-'=:''ff;i^f-l^J^^l^!;^ 
Sl^ili or PPIase'domains. Plant proteins having do^^^^^^^^^^ 

rgirG^^srhaThr^^^^^^^^ 

Proteins such as these may be useful for controlling growth and fomi of development. 

Ref. Vittorioso et al. (1998) Mol Cell Biol 18: 3034^3 Hernandez Torres et al. (1995) 27: 1221-6 

Y. Peptidase M48. 

.□tons. A. ^ .xample, '^"^'S aniily Im, be involve.1 in po»».e«g™ WeracUon. sod. 
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Ref: Fujimura-Kamada et al. (1997) J Cell Biol. 136: 271-85. Tarn et al. (1998) J Cell Biol. 142: 635-49. 
Z. DNA Pol Viral N 

5 [2238] The DNA pol Viral N domain is located at the N-terminal region of DNA polymerase isolated from several 
retroid viruses such as the Cauliflower Mosaic Virus. The domain motif has also been found in numerous other species 
from humans to cyanobacteria. In these organisms, this motif seems to be associated with two types of sequences; 
retrotransposons and mitochondrial genes. In the mitochondrial sequences this domain is potentially involved in the 
self-splicing conducted by group II introns. Various manipulations of this gene in plants allows control of the numerous 

10 retrotransposons endogenous to plant genomes or allows engineering of mitochondrial function, especially to increase 
efficiency of energy utilization by cells. 

REF: Chapdelaine and Bonen (1991) Cell 65: 465-72 Ferat and Miche (1993) Nature 364: 358-61 Wilson et al. (1994) 
368: 32-8 Cambareri et al. (1994) 242: 658-65 Gaardner et al. (1981) NAR 9: 2871-2888 Cummings et al. 
15 (1990) Curr Genet 17: 375-402 Hattori et al. (1986) Nature 321 : 625-8 

Aa. Calpainjnhib 

[2239] This domain is found in calpastatin. an inhibitor protein specific for calpain. Calpain is a non-lysosomal calcium- 
20 dependent intracellular protease that appears to be involved in the dynamic changes of the cytoskeleton. especially 
actin-related structures, during eariy Drosophila embryogenesis [1]. Calpastatins co-exist in celts with calpains and the 
subcellular distribution of calpastatin is thought to be important to calpain regulation [2]. In plants calpains and calp- 
astatins could be involved in embryogenesis and non-embryogenic organ reiteration. Mutations occurring in calpain 
inhibitor repeat domains would produce developmental abnormalities such as abnormal leaf, root or flower develop- 
25 ment. 

[2240] Refs 

1 Emori Y and Saigo K (1994) J Biol Chem 269: 25137-42. 

2 Mellgren RL, Lane RD, Mericle MT (1989) Biochim Biophys Acta 999: 71-77. 

30 

Ab. chorismate^bind 

[2241] Chorismate binding domains are present in plant anthranilate synthase (AS) genes. AS genes catalyze the 
first step in the biosynthesis of tryptophan by converting chorismate and L-glutamine to anthranilate, pyruvate and L- 

35 glutamate. Some of these genes are involved in feedback inhibition by tryptophan [1] while some are feedback insen- 
sitive [2]. In Arabidopsis. two AS genes have overiapping. but different distributions. One of these AS genes is induced 
by wounding and bacterial pathogen infiltration [1]. Mutations in the chorismate binding domain would affect the pro- 
duction of tryptophan and could influence the plant's defense system. AS gene products can be used form vitro syn- 
thesis of tryptophan and tryptophan derivatives. 

40 [2242] Refs 

1 Niyogi KK, Fink GR (1992) Plant Cell 4: 721-33. 

2 Song HS, Brotherton JE, Gonzales RA, Wilholm JM (1998) Plant Physiol 117:533-43. 
45 Ac. late_protein_L2 

[2243] Papillomaviruses are encapsulated double stranded DNA viruses. Plants are susceptible to infection by dou- 
ble stranded DNA viruses such as Cauliflower Mosaic virus (CaMV). The coat proteins in these plant viruses are critical 
to the virus life cycle within the plant. For example, the coat protein of CaMV is thought to be involved in intra- and 
50 inter-cellular movement within the plant [1]. Engineering of proteins having similarity to papillomavirus coat proteins 
may enable the production of plants having better resistance to natural plant double stranded DNA viruses. 
[2244] Refs 

1 Thompson SR. Melcher U (1993) J Gen Virol 74: 1141-8. 
55 Ad. Peptidase_M41 

[2245] Proteins belonging to this peptidase family are metalloproteases that bind zinc as a cofactor and are integral 
membrane proteins. They seem to be involved in the degradation of carboxy-terminal-tagged cytoplasmic proteins. In 
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olants these proteins are located in the thylakoid membranes of the chloroplasts. their expression is l*SM regulat^^^ 
^'i"^:."?!!! ho i„vniv«rt in deoradation of soluble stromal proteins and turnover of thylkcd proteins [1]. 

Ma;i'pu7a«o; ^f^eTpression and structure Jf these proteins would have effects on the efficiency ot pnoxosynines. .u 
the development of chloroplasts. 

SahlM^bak s. Cseke L. PIchersky E. Andersson B. Adam Z (1996) J Biol Chem 271 : 29329-34. 



Ae. UPF0051 



[1,2]. Mutations In 
production. 
[2248] Refe 

1 Kostrzewa M, Zetsche K (1992) J Mol Biol 227: 961-70. 

2 Kostrzewa M. Zetsche K (1993) Plant Mol Biol 23: 67-76 



Af. E7 



or flowering. 
[2250] Rets 

1 Zwerschke W. Jansen-Durr P Adv Cancer Res 2000;78:1-29 



Aq. Peptidase U7 

chloroplasts, thereby affecting growth and development. 

Watanabe A. Yamada M. Yasuda M. Tabata S (1996) DNA Res 3:109-36. 
A Activities of Polypeptides Comprisin g Sional Peptides 

'^T^T^TT^X^''^^ Pepticles are useful to modulate ligand.eceptor interac^onj^^^^^ 
SunicaZ signal transduction, intracellular communication, and activities and/or chemical cascades that take 

fa?sro?e^sT:fts:p;sr^^^^^^^^ 

also modulate signal transduction and communication between cells. 
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[2257] In addition the signal peptide itself can serve as a ligand for some receptors. An example is the interaction of 
the ER targeting signal peptide with the signal recognition particle (SRP). Here, the SRP binds to the signal peptide, 
halting translation, and the resulting SRP complex then binds to docking proteins located on the surface of the ER, 
prompting transfer of the protein into the ER. 
5 [2258] A description of signal peptide residue composition is described below in Subsection IV.C.1. 

III. Methods of Modulating Polypeptide Production 

[2259] It is contemplated that polynucleotides of the invention can be incorporated into a host cell or in-vitro system 
10 to modulate polypeptide production. For instance, the SDFs prepared as described herein can be used to prepare 
expression cassettes useful in a number of techniques for suppressing or enhancing expression. 
[2260] An example are polynucleotides comprising sequences to be transcribed, such as coding sequences, of the 
present invention can be inserted into nucleic acid constructs to modulate polypeptide production. Typically, such se- 
quences to be transcribed are heterologous to at least one element of the nucleic acid construct to generate a chimeric 
15 gene or construct. 

[2261] Another example of useful polynucleotides are nucleic acid molecules comprising regulatory sequences of 
the present invention. Chimeric genes or constructs can be generated when the regulatory sequences of the invention 
linked to heterologous sequences in a vector construct. Within the scope of invention are such chimeric gene and/or 
constructs. 

20 [2262] Also within the scope of the invention are nucleic acid molecules, whereof at least a part or fragment of these 
DNA molecules are presented in REF AND SEQ TABLES 1 AND 2 of the present application, and wherein the coding 
sequence is under the control of its own promoter and/or its own regulatory elements. Such molecules are useful for 
transforming the genome of a host cell or an organism regenerated from said host cell for modulating polypeptide 
production. 

25 [2263] Additionally, a vector capable of producing the oligonucleotide can be inserted into the host cell to deliver the 
oligonucleotide. 

[2264] More detailed description of components to be included in vector constructs are described both above and 
below. 

[2265] Whether the chimeric vectors or native nucleic acids are utilized, such polynucleotides can be incorporated 
30 into a host cell to modulate polypeptide production. Native genes and/or nucleic acid molecules can be effective when 
exogenous to the host cell. 

[2266] Methods of modulating polypeptide expression includes, without limitation: 
Suppression methods, such as 

35 Antisense 
Ribozymes 
Co-suppression 

Insertion of Sequences into the Gene to be Modulated 
Regulatory Sequence Modulation. 

40 

as well as Methods for Enhancing Production, such as 

Insertion of Exogenous Sequences; and 
Regulatory Sequence Modulation. 

45 

III.A. Suppression 

[2267] Expression cassettes of the invention can be used to suppress expression of endogenous genes which com- 
prise the SDF sequence. Inhibiting expression can be useful, for instance, to tailor the ripening characteristics of a fruit 
50 (Oeller et al., Science 254:437 (1991)) or to influence seed size (WO98/07842) or to provoke cell ablation (Mariani et 
al., Nature 357: 384-387 (1992). 

[2268] As described above, a number of methods can be used to inhibit gene expression in plants, such as antisense, 
ribozyme. introduction of exogenous genes into a host cell, insertion of a polynucleotide sequence into the coding 
sequence and/or the promoter of the endogenous gene of interest, and the like. 

55 

1 1 1. A. 1. Antisense 

[2269] An expression cassette as described above can be transformed into host cell or plant to produce an antisense 
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strand of RNA. For plant cells, antisense RNA inhibits gene expression by preventing the accumulation of mRNA which 
Z^dcz th.= snzi'rris of intere-st e.g., Sheehy et al.. Proc. Nat. Acad. Sci. USA, 85:8805 (1988). and H.att et a!.. 
U.S. Patent No. 4,801,340. 

III.A.2. Ribozymes 

[2270] Similariy, riboryme constructs can be transformed into a plant to cleave mRNA and down-regulate translation. 
III.A.3. Co-SuDPression 

[2271] Another method of suppression is by introducing an exogenous copy of the gene to be suppressecl Introduc- 
Ln of expression cassettes in which a nucleic acid is configured in the sense orientation with '^^P^}'^^^^'^'^ 
has Seen Shown to prevent the accumulation of mRNA. A detailed description of this method is described above. 

III.A.4. Insertion of Sequences into the Gen e to be Modulated 

12272] Yet another means of suppressing gene expression is to insert a polynucleotide into the gene of interest to 
disrupt transcription or translation of the gene. „,,of cm 

[2273] Homologous recombination could be used to target a polynucleotide insert to a ge"^,"^'"9 '^^^f^^^^^^^*!^^ 
IaTc Vergunst et al., Nucleic Acids Res. 26:2729 (1998). A.C. Vergunstetal., Plant Mol. B,ol. 38:393 (1998), H. Albert 

S2%'''^ln additfon^ random insertion of polynucleotides into a host cell genome can also be used to dismpt the gene 
of interest Azpiroz-Leehan et al.. Trends in Genetics 13:1 52 (1997). In this method, screening for clones from a library 
containing random insertions is preferred for identifying those that have polynucleotides inserted 9en« °^ 

terest. such screening can be perfomned using probes and/or primers described above based ^ 
AND SEQ TABLES 1 AND 2. fragments thereof, and substantially similar sequence thereto. The screening can also 
be performed by selecting clones or any transgenic plants having a desired phenotype. 

III.A.5. Regulatory SeauenceModulation 

[2275] The SDFs described in REF and SEQ TABLES 1 and 2, and fragments thereof are examples of nucleotides 
of the invention that contain regulatoiv sequences that can be used to suppress or inactivate transcription andtor 
translation from a gene of interest as discussed in I.C.5. 

II1.A.6. Genes Comprising Dominant-Nega tive Mutations 

[2276] When suppression of production of the endogenous, native protein is desired it is often helpful e^P^^ss a 
gene Lmprising a dominant negative mutation. Production of protein variants produced from genes «>'"P"^;"9 
fnant negative mutations is a useful tool for research Genes comprising dominant negative mutations can produce a 
van^ntXJptidtwhioh is capable of competing with the native polypeptide, but which does not ^o6u^J^ r^ 
rasult. Coniquently. over expression of genes comprising these mutations can titrate out an undesired ^^^^^^ 
native protein For example. The product from a gene comprising a dominant negative mutation o a ^^^P^; ^" Je 
used to constitutively activate or suppress a signal transduction cascade, allowing examination of the phenotype and 
th^s the^Ss) confrolled by that re^ptor and pathway. Alternatively, the protein arising from the gene compnsing a 
dominant-negative mutation can be an inactive enzyme still capable of binding to the same substrate as the native 
protein and therefore competes with such native protein. , _ „ fK-. „»»h,o nmt«>in if^Pif to 

[2277] Products from genes comprising dominant-negative mutations can also act upon the native protein iteetf to 
prevent activity. For example, the native protein may be active only as a homo^ultimer or as one subunit of a hetero- 
multimer. Incorporation of an inactive subunit into the multimer with native subunit(s) can inhibit activity 

Thus gene function can be modulated in host cells of interest by insertion into these cells vector constructs 
comprising a gene comprising a dominant-negative mutation. 

lll.B. Enhanced Expression 

[2279] Enhanced expression of a gene of interest In a host cell can be accomplished by either (1) insertion of an 
exogenous gene; or (2) promoter modulation. 
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HI.B.1. Insertion of an Exogenous Gene 

[2280] Insertion of an expression constaict encoding an exogenous gene can boost the number of gene copies 
expressed in a host cell. 

5 [2281] Such expression constructs can comprise genes that either encode the native protein that is of interest or 
that encode a variant that exhibits enhanced activity as compared to the native protein. Such genes encoding proteins 
of interest can be constructed from the sequences from REF AND SEQ TABLES 1 AND 2, fragments thereof, and 
substantially similar sequence thereto. 

[2282] Such an exogenous gene can include either a constitutive promoter pemiitting expression in any cell in a host 
10 organism or a promoter that directs transcription only In particular cells or times during a host cell life cycle or in response 
to environmental stimuli. 

III. B.2. Regulatory Sequence Modulation 

15 [2283] The SDFs of REF and SEQ TABLES 1 AND 2, and fragments thereof, contain regulatory sequences that can 
be used to enhance expression of a gene of interest. For example, some of these sequences contain useful enhancer 
elements. In some cases, duplication of enhancer elements or insertion of exogenous enhancer elements will increase 
expression of a desired gene from a particular promoter. As other examples, all 11 promoters require binding of a 
regulatory protein to be activated, while some promoters may need a protein that signals a promoter binding protein 

20 to expose a polymerase binding site. In either case, over-production of such proteins can be used to enhance expres- 
sion of a gene of interest by increasing the activation time of the promoter 

[2284] Such regulatory proteins are encoded by some of the sequences in REF AND SEQ TABLES 1 AND 2. frag- 
ments thereof, and substantially similar sequences thereto. 

[2285] Coding sequences for these proteins can be constructed as described above. 

25 

IV. Gene Constructs and Vector Construction 

[2286] To use isolated SDFs of the present invention or a combination of them or parts and/or mutants and/or fusions 
of said SDFs in the above techniques, recombinant DNA vectors which comprise said SDFs and are suitable for trans- 
30 formation of cells, such as plant cells, are usually prepared. The SDF construct can be made using standard recom- 
binant DNA techniques (Sambrook et al. 1989) and can be introduced to the species of interest by Agrobacterium- 
mediated transformation or by other means of transformation (e.g., particle gun bombardment) as referenced below. 
[2287] The vector backbone can be any of those typical in the art such as plasmids, viruses, artificial chromosomes, 
BACs, YACs and PACs and vectors of the sort described by 

35 

(a) SAC: Shizuya et al., Proc. Natl. Acad. Sci. USA 89: 8794-8797 (1992); Hamilton et al., Proc. Natl. Acad. Sci. 
USA 93: 9975-9979 (1996); 

(b) YAC: Burke et al.. Science 236:806-812 (1987);. 

(c) PAC: Sternberg N. et al.. Proc Natl Acad Sci USA. Jan;87(1): 103-7 (1990); 

40 (d) Bacteria-Yeast Shuttle Vectors: Bradshaw et al.. NucI Acids Res 23: 4850-4856 (1995); 

(e) Lambda Phage Vectors: Replacement Vector, e.g., Frischauf etal., J. Mol Biol 170: 827-842 (1983); or Insertion 
vector, e.g., Huynh et al., In: Glover NM (ed) DNA Cloning: A practical Approach, Vol.1 Oxford: IRL Press (1985); 
(0 T-DNA gene fusion vectors: Walden et al.. Mot Cell Biol 1: 175-194 (1990); and 
(g) Plasmid vectors: Sambrook et al.. infra. 

45 

[2288] Typically, a vector will comprise the exogenous gene, which in its turn comprises an SDF of the present 
invention to be introduced into the genome of a host cell, and which gene may be an antisense construct, a ribozyme 
construct chimeraplast, or a coding sequence with any desired transcriptional and/or translational regulatory sequenc- 
es, such as promoters, UTRs, and 3' end termination sequences. Vectors of the invention can also Include origins of 

50 replication, scaffold attachment regions (SARs), markers, homologous sequences, introns, etc. 

[2289] A DNA sequence coding for the desired polypeptide, for example a cDNA sequence encoding a full length 
protein, will preferably be combined with transcriptional and translational initiation regulatory sequences which will 
direct the transcription of the sequence from the gene in the intended tissues of the transformed plant 
[2290] For example, for over-expression, a plant promoter fragment may be employed that will direct transcription 

55 of the gene in all tissues of a regenerated plant Altematively, the plant promoter may direct transcription of an SDF of 
the invention in a specific tissue (tissuespecific promoters) or may be otherwise under more precise environmental 
control (inducible promoters). 

[2291] If proper polypeptide productionis desired, a polyadenylation region at the 3*-end of the coding region is typ- 
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IV.A. Codinp Sequences 
produced. 



IV.B. Promoters 



12294) AS explained above, introducing an exogenous SDF from the same ^P-^ VdV" tnt^^s"^^^^^^ 
Over-expression of an SDF transgene can ^^^^^'''^^''J'^ cnecies as demonstrated by similar analysis of the 

b^ranipufaied or its spatial locai^ation in an organ- or tissue- specific manner can be altered. 
IV. C Signal Peptides 

12296] SDFs of the present invention containing signal peptides are indicated in the REF and SEQ TABLES Mn s^^^^^ 

S^^trirarrSfJnSTdr^^^ .ocatlon in mature, folded proteins. Signal peptides also se.e as 

ligands for some receptors. control the phenotypic expression of 
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of the protein in specific organelles (plastids, as an example), secretion outside of the cell, targeting interaction with 
particular receptors, etc. Hence, the Inclusion of signal proteins in constructs involving the SDFs of the invention in- 
creases the range of manipulation of SDF phenotyplc expression. The nucleotide sequence of the signal peptide can 
be isolated from characterized genes using common molecular biological techniques or can be synthesized in vitro. 
5 [2299] In addition, the native signal peptide sequences, both amino acid and nucleotide, described in the REF and 
SEQ tables can be used to modulate polypeptide transport. Further variants of the native signal peptides described in 
the REF and SEQ tables are contemplated. Insertions, deletions, or substitutions can be made. Such variants will 
retain at least one of the functions of the native signal peptide as well as exhibiting some degree of sequence identity 
to the native sequence. 

10 [2300] Also, fragments of the signal peptides of the Invention are useful and can be fused with other sinal peptides 
of interest to modulate transport of a polypeptide. 

V. Transformation Techniques 

15 [2301] A wide range of techniques for inserting exogenous polynucleotides are known for a number of host cells, 
including, without limitation, bacterial, yeast, mammalian, insect and plant cells. 

[2302] Techniques for transforming a wide variety of higher plant species are well known and described in the tech- 
nical and scientific literature. See, e.g. Weistng et al.. Ann. Rev. Genet. 22:421 (1988); and Christou. Euphytica, v. 85, 
n.1-3:13-27. (1995). 

20 [2303] DNA constructs of the invention may be introduced into the genome of the desired plant host by a variety of 
conventional techniques. For example, the DNA construct may be introduced directly into the genomic DNA of the 
plant cell using techniques such as electroporation and microinjection of plant cell protoplasts, or the DNA constructs 
can be introduced directly to plant tissue using ballistic methods, such as DNA particle bombardment. Alternatively, 
the DNA constructs may be combined with suitable TDNA flanking regions and introduced into a conventional ^gro- 

25 bacterium tumefaciens host vector. The virulence functions of the Agrobacterium tumefaciens host will direct the in- 
sertion of the construct and adjacent marker into the plant cell DNA when the cell is infected by the bacteria (McCormac 
et al.. Mo!. BiotechnoL 8:199 (1997); Hamilton. Gene 200:107 (1997)); Salomon et al. EMBO J. 3:141 (1984); Herrera- 
Estrella et al. EMBO J. 2:987 (1983). 

[2304] Microinjection techniques are known in the art and well described in the scientific and patent literature. The 
30 introduction of DNA constructs using polyethylene glycol precipitation is described in Paszkowski et al. EMBO J. 3: 

2717 (1984). Electroporation techniques are described in Fromm et al. Proc. Natl Acad. ScL USyA 82:5824 (1985). 

Ballistic transformation techniques are described in Klein et al. Nature 327 :773 (1987). Agrobacterium tumefaciens 

mediated transformation techniques, including disarming and use of binary or cointegrate vectors, are well described 

in the scientific literature. See. for example Hamilton. CM., Gene 200 :107 (1997); Muller et al. Mot. Gen. Genet. 207: 
35 171 (1987); Komari et al. Plant J. 10:165 (1996); Venkateswariu et al. Biotechnology 9:1103 (1991) and Gleave, AP., 

Plant Mot. Biol. 20:1203 (1992); Graves and Goldman. Plant MoL Biol. 7:34 (1986) and Gould et al., Plant Physiology 

95:426 (1991). 

[2305] Transformed plant cells which are derived by any of the above transformation techniques can be cultured to 
regenerate a whole plant that possesses the transformed genotype and thus the desired phenotype such as seedless- 

^ ness. Such regeneration techniques rely on manipulation of certain phytohormones in a tissue culture growth medium, 
typically relying on a biocide and/or herbicide marker which has been introduced together with the desired nucleotide 
sequences. Plant regeneration from cultured protoplasts is described in Evans et al.. Protoplasts Isolation and Culture 
in Handbook of Plant Cell Culture," pp. 124176. MacMillan Publishing Company New York, 1983; and Binding, Re- 
generation of Plants, Plant Protoplasts, pp. 2173. CRC Press, Boca Raton, 1988. Regeneration can also be obtained 

45 from plant callus, explants, organs, or parts thereof. Such regeneration techniques are described generally in Klee et 
al. Ann. Rev. of Plant Phys. 38:467 (1987). Regeneration of monocots (rice) is described by Hosoyama et al. (Biosci. 
BiotechnoL Biochem. 58:1500 (1994)) and by Ghosh et al. (J. BiotechnoL 32:1 (1994)). The nucleic acids of the inven- 
tion can be used to confer desired traits on essentially any plant. 

[2306] Thus, the invention has use over a broad range of plants, including species from the genera Anacardium, 
50 Arachis, Asparagus, Atropa, Avena, Brassica, Citrus, Citrullus, Capsicum, Carthamus, Cocos, Coffee, Cucumis, Cu- 

curbita, Daucus, Elaeis, Fragaria, Glycine, Gossypium, Helianthus, Heterocallis, Hordeum, Hyoscyamus, Lactuca, 

Linum, Lolium.Lupinus, Lycopersicon, Malus, Manihot, Majorana, Medicago, Nicotiana, Olea, Oryza, Panieum, Pan- 

nesetum, Persea, Phaseolus, Pistachia, Pisum, Pyrus, Prunus, Raphanus, Ricinus, Secale, Senecio, Sinapis, Sola- 

num, Sorghum, Theobromus, Trigonella, Triticum. Vicia, Vitis, Vigna, and. Zea. 
55 [2307] One of skill will recognize that after the expression cassette is stably incorporated in transgenic plants and 

confimied to be operable. It can be introduced into other plants by sexual crossing. Any of a number of standard 

breeding techniques can be used, depending upon the species to be crossed. 

[2308] The particular sequences of SDFs identified are provided In the attached REF AND SEQ TABLES 1 AND 2. 
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One of ordinary skill in the art, having this data, can obtain cloned DNA fragments, synthetic DNA fragments or polypep- 
tirtes constituting desired seauences by recombinant methodology known in the art or described herein. 



EXAMPLES 

[2309] The invention is Illustrated by way of the following examples. The invention Is not limited by these examples 
as the scope of the invention is defined solely by the claims following. 

EXAMPLE 1: cDNA PREPARATION 

[23101 A number of the nucleotide sequences disclosed in REF AND SEQ TABLES 1 AND 2 herein as representative 
of the SDFs of the invention can be obtained by sequencing genomic DNA (gDNA) and/or cDNA from corn plants 
grown from HYBRID SEED # 35A19, purchased from Pioneer Hi-Bred International, Inc., Supply Management, RO. 
Box 256. Johnston, Iowa 50131-0256. 

[23111 A number of the nucleotide sequences disclosed in REF AND SEQ TABLES 1 AND 2 herein as representative 
of the SDFs of the invention can also be obtained by sequencing genomic DNA from Arabidopsis thaliana. Wassilews- 
kija ecotype or by sequencing cDNA obtained from mRNA from such plants as described below This is a true breeding 
strain Seeds of the plant are available from the Arabidopsis Biological Resource Center at the Ohio State University, 
under the accession number CS2360. Seeds of this plant were deposited under the terms and conditions of the Bu- 
dapest Treaty at the American Type Culture Collection, Manassas, VA on August 31 . 1999, and were assigned ATCC 

[23121^ Sher methods for cloning full-length cDNA are described, for example, by Sekl et al., P/anf Jouma/ 15:707-720 
(1998) High-efficiency cloning of Arabidopsis full-length cDNA by biotinylated Cap trapper; Maruyama et al.. Gene 
138:171 (1994) Oligo-capping a simple method to replace the cap structure of eukaryotic mRNAs with oligonbonucle- 
otides"; and WO 96/34981. , ^ 

[231 31 Tissues were, or each organ was, individually pulverized and frozen in liquid nitrogen. Next, the samples were 
homogenized In the presence of detergents and then centrifuged. The debris and nuclei were removed from the sample 
and more detergents were added to the sample. The sample was centrifuged and the debris was removed. Then the 
sample was applied to a 2M sucrose cushion to isolate polysomes. The RNA was isolated by treatment with detergents 
and proteinase K followed by ethanol precipitation and centrifugatlon. The polysomal RNA from the different t^sues 
was pooled according to the following mass ratios: 15/15/1 for male inflorescences, female inflorescences and root, 
respectively The pooled material was then used for cDNA synthesis by the methods descnbed below. 
[2314] Starting material for cDNA synthesis for the exemplary com cDNA clones with sequences presented in REF 
AND SEQ TABLES 1 AND 2 was poly(A)-containing polysomal mRNAs from inflorescences and root tissues of com 
plants grown from HYBRID SEED # 35A19. Male inflorescences and female (pre-and post-fertilization) inflorescences 
were isolated at various stages of development. Selection for poly(A) containing polysomal RNA was done using oligo 
d(T) cellulose columns, as described by Cox and Goldberg, Plant Molecular Biology: A Practical Approach , pp. 1-35. 
Shaw ed c 1 988 by IRL, Oxford. The quality and the Integrity of the polyA+ RNAs were evaluated. 
[231 5] Starting material for cDNA synthesis for the exemplary Arabidopsis cDNA clones with sequences presented 
in REF AN D SEQ TABLES 1 AND 2 was polysomal RNA isolated from the top-most inflorescence tissues of /\ra/)/c/Qpsis 
thaliana Wassilewskija (Ws.) and from roots o1 Arabidopsis thaliana Landsberg erecta (L. er.), also obtained from the 
Arabidopsis Biological Resource Center. Nine parts inflorescence to every part root was used, as measured by wet 
mass Tissue was pulverized and exposed to liquid nitrogen. Next, the sample was homogenized in the presence of 
detergents and then centrifuged. The debris and nuclei were removed from the sample and more detergents were 
added to the sample. The sample was centrifuged and the debris was removed and the sample was applied to a 2M 
sucrose cushion to isolate polysomal RNA. Cox et al.. Plant Molecular Blotogy: A Practical Approach", pp. 1-35. Shaw 
ed c 1988 by IRL Oxford. The polysomal RNA was used for cDNA synthesis by the methods descnbed betow. 
Polysomal mRNA was then isolated as described above for com cDNA. The quality of the RNA was assessed elec- 

SeT^'SlwIng preparation of the mRNAs from various tissues as described above, selection of mRNA with intact 
5- ends and specific attachment of an oligonucleotide tag to the 5" end of such mRNA was performed using either a 
chemical or enzymatic approach. Both techniques take advantage of the presence of the cap" structure, which char- 
acterizes the 5- end of most intact mRNAs and which comprises a guanosine generally methylated once, at the 7 

[Mi'ti" The chemical modification approach involves the optional elimination of the Z, 3'-cis diol of the 3' terminal 
ribose the oxidation of the 2', 3--cis diol of the ribose linked to the cap of the 5' ends of the mRNAs into a dialdehyde, 
and the coupling of the such obtained dialdehyde to a derivatized oligonucleotide tag. Further detail regarding the 
chemical approaches for obtaining mRNAs having intact 5' ends are disclosed in International Application No. 
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W096/34981 published November 7,1996. 

[231 8] The enzymatic approach for ligating the oligonucleotide tag to the intact 5' ends of mRNAs involves the removal 
of the phosphate groups present on the 5' ends of uncapped incomplete mRNAs, the subsequent decapping of mRNAs 
having intact 5' ends and the ligation of the phosphate present at the 5* end of the decapped mRN A to an oligonucleotide 
5 tag. Further detail regarding the enzymatic approaches for obtaining mRNAs having intact 5* ends are disclosed in 
Dumas Milne Edwards J.B. (Doctoral Thesis of Paris VI University, Le clonage des ADNc complets: difficultes et per- 
spectives nouveiles. Apports pour I'etude de la regulation de I'expression de la tryptophane hydroxylase de rat, 20 
Dec. 1993), EPO 625572 and Kato et al., Gene 150:243-250 (1994). 

[2319] In both the chemical and the enzymatic approach, the oligonucleotide tag has a restriction enzyme site (e.g. 
10 an EcoRI site) therein to facilitate later cloning procedures. Following attachment of the oligonucleotide tag to the 
mRNA, the integrity of the mRNA is examined by performing a Northern blot using a probe complementary to the 
oligonucleotide tag. 

[2320] For the mRNAs joined to oligonucleotide tags using either the chemical or the enzymatic method, first strand 
cDNA synthesis is performed using an oligo-dT primer with reverse transcriptase. This oligo-dT primer can contain an 
15 intemal tag of at least 4 nucleotides, which can be different from one mRNA preparation to another. Methylated dCTP 
is used for cDNA first strand synthesis to protect the internal EcoRI sites from digestion during subsequent steps. The 
first strand cDNA is precipitated using isopropanol after removal of RNA by alkaline hydrolysis to eliminate residual 
primers. 

[2321 ] Second strand cDNA synthesis is conducted using a DNA polymerase, such as Klenow fragment and a primer 
20 corresponding to the 5* end of the ligated oligonucleotide. The primer is typically 20-25 bases in length. Methylated 
dCTP is used for second strand synthesis in order to protect intemal EcoRI sites in the cDNA from digestion during 
the cloning process. 

[2322] Following second strand synthesis, the full-length cDNAs are cloned into a phagemid vector, such as pBlue- 
Script™ (Stratagene). The ends of the full-length cDNAs are blunted with T4 DNA polymerase (Biolabs) and the cDNA 
25 Is digested with EcoRI. Since methylated dCTP is used during cDNA synthesis, the EcoRI site present in the tag is the 
only hemi-methylated site; hence the only site susceptible to EcoRI digestion. In some instances, to facilitate subclon- 
Ing, an Hind III adapter is added to the 3' end of full-length cDNAs. 

[2323] The full-length cDNAs are then size fractionated using either exclusion chromatography (AcA. Biosepra) or 
electrophoretic separation which yields 3 to 6 different fractions. The full-length cDNAs are then directionally cloned 

30 either into pBlueScript™ using either the EcoRI and Smal restriction sites or, when the Hind III adapter is present in 
the full-length cDNAs, the EcoRI and Hind III restriction sites. The ligation mixture is transformed, preferably by elec- 
troporation, into bacteria, which are then propagated under appropriate antibiotic selection. 
[2324] Clones containing the oligonucleotide tag attached to fulMength cDNAs are selected as follows. 
[2325] The plasmid cDNA libraries made as described above are purified (e.g. by a column available from Qiagen). 

35 A positive selection of the tagged clones is performed as follows. Briefly, in this selection procedure, the plasmid DNA 
is converted to single stranded DNA using phage Fl gene 11 endonuclease in combination with an exonuclease (Chang 
et al., Gene 127 :95 (1993)) such as exonuclease III or T7 gene 6 exonuclease. The resulting single stranded DNA is 
then purified using paramagnetic beads as described by Fry et al., Biotechniques 13: 124 (1992). Here the single 
stranded DNA is hybridized with a biotinylated oligonucleotide having a sequence corresponding to the 3' end of the 

40 oligonucleotide tag. Preferably, the primer has a length of 20-25 bases. Clones including a sequence complementary 
to the biotinylated oligonucleotide are selected by incubation with streptavidin coated magnetic beads followed by 
magnetic capture. After capture of the positive clones, the plasmid DNA is released from the magnetic beads and 
converted into double stranded DNA using a DNA polymerase such as ThermoSequenase™ (obtained from Amersham 
Pharmacia Biotech). Altematively. protocols such as the Gene Trapper^ kit (Gibco BRL) can be used. The double 

45 stranded DNA is then transformed, preferably by electroporation, into bacteria. The percentage of positive clones 
having the 5' tag oligonucleotide is typically estimated to be between 90 and 98% from dot blot analysis. 
[2326] Following transformation, the libraries are ordered in microtiter plates and sequenced. The Arabidopsis library 
was deposited at the American Type Culture Collection on January 7, 2000 as E-co// liba 010600" under the accession 
number PTA-1161. 

50 

EXAMPLE 2: SOUTHERN HYBRIDIZATIONS 

[2327] The SDFs of the invention can be used in Southem hybridizations as described above. The following describes 
extraction of DNA from nuclei of plant cells, digestion of the nuclear DNA and separation by length, transfer of the 
55 separated fragments to membranes, preparation of probes for hybridization, hybridization and detection of the hybrid- 
ized probe. 

[2328] The procedures described herein can be used to isolate related polynucleotides or for diagnostic purposes. 
Moderate stringency hybridization conditions, as defined above, are described In the present example. These condi- 
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tionsresultindetection of hybridization between sequences havingatleast70%sequenceident^^^^ 

.he hyhridization and wash conditions can be changed to reflect the desired percentage of sequence identity between 

orobe and target sequences that can be detected. . 

?23291 In the following procedure, a probe for hybridization is produced from two PGR reactions using two pnmers 

KenorrircsLq^^^^^^^^^^ 

pa^ol ^reS?RTotuct IS assessed to validate the size of the pri.er to assure J i-Jf-P^^^^^^^^^^^ 
the product of the first PGR Is used as a template, with the same pair of pnmers used m the first PGR. in a second 

sirvcrnud^citrbX^^^^^^ 

genomic DNA fragments by known methods for further purification and/or characterization. 
Buffers for nuclear DNA extraction 



15 [2332] 

1. 10X HB 



20 



25 



30 



35 



40 





1000 ml 




40 mM spermidine 


10.2 g 


Spermine (Sigma S-2876) and spennidine (Sigma S-2501) 


10 mM spermine 


3.5 g 


Stabilize chromatin and the nuclear membrane 


0.1 M EDTA (disodium) 


37.2 g 


EDTA inhibits nuclease 


0.1 MTris 


12.1 g 


Buffer 


0.8 M KCI 


59.6 g 


Adjusts ionic strength for stability of nuclei 


Adjust pH to 9.5 with 10 N NaO 
to inactivate this nuclease. 


H. It appears that there is a nuclease present in leaves, use oi pi i n.o dppua.o 



'•'He:;::::^s?th%'fin^^^^^^^^ 

to final volume; stir constantly until it has dissolved. Bring the solution to volume. 
3. Sarkosyl solution (lyses nuclear membranes) 





1000 ml 


N-lauroyI sarcosine (Sarkosyl) 
0.1 M Tris 

0.04 M EDTA (Disodium) 
Adjust the pH to 9.5 after all the components are dissolved and bring 


20.0 g 

12.1 g 
14.9 g 

up to the proper volume. 



45 



4. 20% Triton X-1 00 

80 ml Triton X-1 00 

320 ml 1xHB (w/o P-ME and PMSF) 

Prepare in advance; Triton takes some time to dissolve 



50 



55 



A. Procedure 
[2333] 

1. Prepare 1X H" buffer (keep ice-cold during use) 





1000 ml 


10X HB 


100 ml 
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(continued) 





1000 ml 


2 M sucrose 
Water 


250 ml a non-ionic osmoticum 
634 ml 


Added just before use: 




100 mM PMSF* 
P-mercaptoethanol 


10 ml a protease inhibitor; protects nuclear membrane proteins 
1 ml inactivates nuclease by reducing disulfide bonds 



•100 mM PMSF 

(phenyl methyl sutfonyl fluoride, Sigma P-7626} 
(add 0.0875 g to 5 ml 100% ethanol) 



2. Homogenize the tissue in a blender (use 300-400 ml of IxHB per blender). Be sure that you use 5-10 ml of HB 
buffer per gram of tissue. Blenders generate heat so be sure to keep the homogenate cold. It is necessary to put 
the blenders in ice periodically. 

3. Add the 20% Triton X-100 (25 ml per liter of homogenate) and gently stir on ice for 20 min. This lyses plastid, 
but not nuclear, membranes. 

20 

4. Filter the tissue suspension through several nylon filters into an ice-cold beaker. The first filtration is through a 
250-micron membrane; the second is through an 85-micron membrane; the third is through a 50-micron membrane; 
and the fourth is through a 20-micron membrane. Use a large funnel to hold the filters. Filtration can be sped up 
by gently squeezing the liquid through the filters. 

25 

5. Centrifuge the filtrate at 1200 x g for 20 min. at 4''C to pellet the nuclei. 

6. Discard the dark green supernatant. The pellet will have several layers to it. One is starch; it is white and gritty. 
The nuclei are gray and soft. In the early steps, there may be a dark green and somewhat viscous layer of chlo- 
roplasts. 

Wash the pellets in about 25 ml cold H buffer (with Triton X-100) and resuspend by swirling gently and pipetting. 
After the pellets are resuspended. 

Pellet the nuclei again at 1200 - 1300 x g. Discard the supematant. 

Repeat the wash 3-4 times until the supernatant has changed from a dark green to a pale green. This usually 
happens after 3 or 4 resuspensions. At this point, the pellet is typically grayish white and very slippery The 
Triton X-100 in these repeated steps helps to destroy the chloroplasts and mitochondria that contaminate the 
prep. Resuspend the nuclei for a final time in a total of 1 5 ml of H buffer and transfer the suspension to a sterile 
125 ml Erienmeyer flask. 



45 



7. Add 15 ml, dropwise, cold 2% Sarkosyl, 0.1 M Tris, 0.04 M EDTA solution (pH 9.5) while swirling gently. This 
lyses the nuclei. The solution will become very viscous. 

8. Add 30 grams of CsCI and gently swirl at room temperature until the CsCI is in solution. The mixture will be gray, 
white and viscous. 



9. Centrifuge the solution at 1 1 ,400 x g at 4°C for at least 30 min. The longer this spin is. the firmer the protein pellicle. 

50 

10. The result is typically a clear green supernatant over a white pellet, and (perhaps) under a protein pellicle. 
Carefully remove the solution under the protein pellicle and above the pellet. Determine the density of the solution 
by weighing 1 ml of solution and add CsCI if necessary to bring to 1 .57 g/ml. The solution contains dissolved solids 
(sucrose etc) and the refractive index alone will not be an accurate guide to CsCI concentration. 

55 

11. Add 20 ^1 of 10 mg/ml EtBr per ml of solution. 
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12. Centrifuge at 184,000 x g for 16 to 20 hours in a fixed-angle rotor. 

13 Remove the dark red supernatant that is at the top of the tube with a plastic transfer pipette and discard. 
Carefully remove the DNA band with another transfer pipette. The DNA band is usually visible in room light; oth- 
erwise, use a long wave UV light to locate the band. 

14 Extract the ethidium bromide with isopropanol saturated with water and salt. Once the solution is clear, extract 
at least two more times to ensure that all of the EtBr is gone. Be very gentle, as it is very easy to shear the DNA 
at this step. This extraction may take a while because the DNA solution tends to be very viscous. If the solution is 
too viscous, dilute it with TE. 

1 5. Dialyze the DNA for at least two days against several changes (at least three times) of TE (1 0 mM Tris. 1 mM 
EDTA, pH 8) to remove the cesium chloride. 

1 6 Remove the dialyzed DNA from the tubing. If the dialyzed DNA solution contains a lot of debris, centrifuge the 
DNA solution at least at 2500 x g for 10 min. and carefully transfer the clear supernatant to a new tube. Read the 
A260 concentration of the DNA. 

17 Assess the quality of the DNA by agarose gel electrophoresis (1 % agarose gel) of the DNA. Load 50 ng and 
100 ng (based on the OD reading) and compare it with known and good quality DNA. Undigested lambda DNA 
and a lambda-Hindlll-digested DNA are good molecular weight makers. 

Protocol for Digestion of Genomic DNA 

Protocol: 



[2334] 



1 The relative amounts of DNA for different crop plants that provide approximately a balanced number of genome 
equivalent is given in Table 3. Note that due to the size of the wheat genome, wheat DNA will be unden-epresented. 
Lambda DNA provides a useful control for complete digestion. 

2 Precipitate the DNA by adding 3 volumes of 100% ethanol. Incubate at -20°C for at least two hours. Yeast DNA 
can be purchased and made up at the necessary concentration, therefore no precipitation is necessary for yeast 
DNA. 

3 Centrifuge the solution at 11 .400 x g for 20 min. Decant the ethanol carefully (be careful not to disturb the pellet). 
Be sure that the residual ethanol is completely removed either by vacuum desiccation or by carefully wiping the 
sides of the tubes with a clean tissue. 

4. Resuspend the pellet in an appropriate volume of water. Be sure the pellet is fully resuspended before proceeding 
to the next step. This may take about 30 min. 

5 Add the appropriate volume of 10X reaction buffer provided by the manufacturer of the restriction enzyme to 
the resuspended DNA followed by the appropriate volume of enzymes. Be sure to mix it properly by slowly swirling 
the tubes. 

6. Set-up the lambda digestion-control for each DNA that you are digesting. 

7. Incubate both the experimental and lambda digests ovemight at 37°C. Spin down condensation in a microfuge 
before proceeding. 

8 After digestion, add 2 ^il of loading dye (typically 0.25% bromophenol blue. 0.25% xylene cyanol in 15% Ficoll 
or 30% glycerol) to the lambda-control digests and load in 1% TPE-agarose gel (TPE is 90 mM Tns-phosphate. 2 
mM EDTA. pH 8). If the lambda DNA in the lambda control digests are completely digested, proceed with the 
precipitation of the genomic DNA in the digests. 

9. Precipitate the digested DNA by adding 3 volumes of 100% ethanol and incubating in -20pC for at least 2 hours 
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(preferably overnight). 

EXCEPTION: Arabidopsis and yeast DNA are digested in an appropriate volume; they dont have to be precipitated. 

10. Resuspend the DNA in an appropriate volume of TE (e.g., 22 ^1 x 50 blots = 1100 and an appropriate volume 
of 10X loading dye (e.g., 2.4 \l\ x 50 blots =120 ^1). Be careful in pipetting the loading dye - it is viscous. Be sure 
you are pipetting the correct volume. 



Table 3 



Some guide points in digesting genomic DNA. 


Species 


Genome Size 


Size Relative to 
Arabidopsis 


Genome Equivalent to 
2 \ig Arabidopsis DNA 


Amount of DNA per 
blot 


Arabidopsis 


120 Mb 


IX 


IX 


2 Jig 


Brassica 


1.100 Mb 


9.2X 


0.54X 


10 Jig 


Corn 


2.800 Mb 


23.3X 


0.43X 


20 MQ 


Cotton 


2,300 Mb 


19.2X 


0.52X 


20 ^g 


Oat 


11,300 Mb 


94X 


0.11X 


20 Jig 


Rice 


400 Mb 


3.3X 


0.75X 


5 Jig 


Soybean 


1,100 Mb 


9.2X 


0.54X 


10 Jig 


Sugarbeet 


758 Mb 


6.3X 


0.8X 


10 Jig 


Sweetclover 


1.100 Mb 


9.2X 


0.54X 


10 Jig 


Wheat 


16,000 Mb 


133X 


0.08X 


20 Jig 


Yeast 


15 Mb 


0.1 2X 


IX 


0.25 Jig 



Protocol for Southern Blot Analysis 



[2335] The digested DNA samples are electrophoresed in 1 % agarose gels in Ix TPE buffer. Low voltage; ovemight 
separations are preferred. The gels are stained with EtBr and photographed. 

1 . For blotting the gels, first incubate the gel in 0.25 N HCI (with gentle shaking) for about 15 min. 

2. Then briefly rinse with water. The DNA is denatured by 2 incubations. Incubate (with shaking) in 0.5 M NaOH 
in1.5MNaClfor15min. 

3. The gel is then briefly rinsed in water and neutralized by incubating twice (with shaking) in 1 .5 M Tris pH 7.5 in 
1.5MNaClfor15min. 

4. A nylon membrane is prepared by soaking it in water for at least 5 min, then in 6X SSC for at least 15 min. 
before use. (20x SSC is 175.3 g NaCI. 88.2 g sodium citrate per liter, adjusted to pH 7.0.) 

5. The nylon membrane is placed on top of the gel and all bubbles in between are removed. The DNA is blotted 
from the gel to the membrane using an absorbent medium, such as paper toweling and 6x SCO buffer. After the 
transfer, the membrane may be lightly brushed with a gloved hand to remove any agarose sticking to the surface. 

6. The DNA is then fixed to the membrane by UV crosslinking and baking at 80°C. The membrane is stored at 4^*0 
until use. 
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B. Protocol for PGR Amplification of Genomic Fragments in Arabidopsis 

Amplification procedures: 
[2336] 

1, Mix the following in a 0.20 ml PGR tube or 96-well PGR plate: 



Volume 


Stock. 


Final Amount or Cone. 


0.5 ^il 


-10 ng/jil genomic DNA^ 


5 ng 


2.5 ^il 


10X PGR buffer 


20 mM Tris, 50 mM KGI 


0.75 \x\ 


50 mM MgCl2 


1.5 mM 




10 pmol/^il Primer 1 (Forward) 


10 pmol 


1^il 


10 pmol/jil Primer 2 (Reverse) 


10 pmol 


0.5 ^ll 


5 mM dNTPs 


0.1 mM 


0.1 ^il 


5 units/M.1 Platinum Taq" (Life Technologies, Gaithersburg. MD) DNA 
Polymerase 


1 units 


(to 25 


Water 





2. The template DNA is amplified using a Perkin Elmer 9700 PGR machine: 



1) 94°G for 10 min. followed by 


2) 


3) 


4) 


5 cycles: 


5 cycles: 


25 cycles: 


94 ""C - 30 sec 
62 °G - 30 sec 
72 **G - 3 min 


94 ''G - 30 sec 
58 °G - 30 sec 
72 °G - 3 min 


94 °G - 30 sec 
53 **G - 30 sec 
72 °G - 3 min 


5) 72°G for 7 min. Then the reactions are stopped by chilling to 4°G. 



[2337] The procedure can be adapted to a multi-well forniat if necessary. 

Quantification and Dilution of PGR Products: 

[2338] 

1 The product of the PGR is analyzed by electrophoresis in a 1 % agarose gel. A linearized plasmid DNA can be 
used as a quantification standard (usually at 50. 100. 200, and 400 ng). These will be used as references to 
approximate the amount of PGR products. Hindlll-digested Lambda DNA is useful as a molecular weight marker 
The gel can be run fairly quickly; e.g.. at 100 volts. The standard gel is examined to determine that the size of the 
PGR products is consistent with the expected size and if there are significant extra bands or smeary products in 
the PGR reactions. 

2. The amounts of PGR products can be estimated on the basis of the plasmid standard. 

3 For the small number of reactions that produce extraneous bands, a small amount of DNA from bands with the 
correct size can be isolated by dipping a sterile 10-pl tip into the band while viewing though a UV Transilluminator. 
The small amount of agarose gel (with the DNA fragment) is used in the labeling reaction. 
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C. Protocol for PCR-DIG-Labeling of DNA 

Solutions: 
5 [2339] 

Reagents in PGR reactions (diluted PGR products. 10X PGR Buffer, 50 mM MgGl2, 5 \Jf\i\ Platinum Taq Polymerase, 
and the primers) 

10 10X dNTP + DIG-11-dUTP [1 :5]: (2 mM dATP. 2 mM dCTP. 2 mM dGTP, 1 .65 mM dTTP. 0.35 mM DIG-11-dUTP) 

10X dNTP + DIG-11-dUTP [1 :10]: (2 mM dATP. 2 mM dCTP, 2 mM dGTP, 1 .81 mM dTTP. 0,19 mM DIG-11-dUTP) 
1 0X dNTP + DIG-11 -dUTP [1 : 1 5]: (2 mM dATP, 2 mM dGTP, 2 mM dGTP, 1 .875 mM dTTP, 0. 1 25 mM DIG-1 1 -dUTP) 

15 

TE buffer (10 mM Tris. 1 mM EDTA, pH 8) 

Maleate buffer: In 700 ml of deionized distilled water, dissolve 11 .61 g maleic acid and 8.77 g NaGI. Add NaOH to 
adjust the pH to 7.5. Bring the volume to 1 L. Stir for 15 min. and sterilize. 

20 

10% blocking solution: In 80 ml deionized distilled water, dissolve 1.1 6g maleic acid. Next, add NaOH to adjust 
the pH to 7.5. Add 10 g of the blocking reagent powder (Boehringer Mannheim, Indianapolis, IN, Cat. no. 1096176). 
Heat to 60°C while stirring to dissolve the powder. Adjust the volume to 100 ml with water. Stir and sterilize. 

25 1% blocking solution: Dilute the 10% stock to 1% using the maleate buffer. 

Buffer 3 (100 mM Tris. 100 mM NaCI. 50 mM MgCl2, pH9.5). Prepared from autoclaved solutions of 1M Tris pH 
9.5. 5 M NaCI, and 1 M MgCl2 in autoclaved distilled water. 

30 Procedure: 

[2340] 

1. PGR reactions are perfonned in 25 |il volumes containing: 



PGR buffer 


1X 


MgCl2 


1.5 mM 


10X dNTP + DIG-11-dUTP 


IX (please see the note below) 


Platinum Taq™" Polymerase 


1 unit 


1 0 pg probe DNA 




10 pmol primer 1 





Note: 

45 





Use for: 


10X dNTP + DIG-11 -dUTP (1:5) 


< 1 kb 


10X dNTP + DIG-11-dUTP (1:10) 
10X dNTP + DIG-11 -dUTP (1:15) 


1 kbto 1.8 kb 
> 1.8 kb 



2. The PGR reaction uses the following amplification cycles: 



55 
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I l^94°Cfor 10min. 



2) 


3) 


4) 


5 cycles: 


5 cycles: 


25 cycles: 


95°C - 30 sec 
61°C - 1 min 
73°C - 5 min 


95° C - 30 sec 
59°C - 1 min 
75°C - 5 min 


95° C - 30 sec 
51°C- 1 min 
73°C - 5 min 


5) 72°C for 8 min. The reactions are terminated by chilling to 4°C (hold). 



3. The products are analyzed by electrophoresis- in a 1% agarose gel. comparing to an aliquot of the unlal,elted 
probe starting material. 

4 The amount of DIG-labeled probe is determined as follows: 

Make s^Mtions of the diluted control DNA in dilution buffer (TE: 10 mM Tris and 1 mM EDTA. pH 8) as 

shown in the following table: 



DIG-labeled control DNA starting cone. 


Stepwise Dilution 


Final Cone. (Dilution Name) 


5 T\gf\L\ 


1 ^1 in 49 ^il TE 


100pg/jil (A) 


100 pg/fil (A) 


25 111 in 25 jil TE 


50 pg/^l (B) 


50 pg/^il (B) 


25 \l\ in 25 |il TE 


25 pg/^Ll (C) 


25 pg/^l (C) 


20 \L\ in 30 \i\ TE 


10 pg/jil (D) 



a. serial deletions of a DIG-labeled standard DNA ranging from 100 pg to 10 pg are sp°"^ « P*'^*^^'^ 
charged nylon membrane, marking the membrane lightly with a pencil to identify each dilution. 

b. Serial dilutions (e.g.. 1:50, 1:2500. 1:10.000) of the newly labeled DNA probe are spotted. 

c. The membrane is fixed by UV crosslinking. 

d. The membrane is wetted with a small amount of maleate buffer and then incubated in 1% blocking solution 
for 15 min at room temp. 

e The labeled DNA is then detected using alkaline phosphatase conjugated anti-DIG antibody (Boehringer 
LnnhSndLapolis. IN. cat. no. 1093274) and an NBT substrate according to the manufacture's instruc 
tion. 

f. Spot intensities of the control and experimental dilutions are then compared to estimate the concentration 
of the PCR-DIG-labeled probe. 

D. Prehybridization and Hybridization of Soutliem Blots 

Solutions: 



[23411 



100% Formamide 


purchased from Gibco 


20X SSC 


(IX = 0.15 M NaCI, 0.015 M NagCitrate) 


per L: 


175gNaCI 


87.5 g NagCitrate 2H2O 



20% Sarkosyl (N-lauroyl-sarcosine) 
20% SDS (sodium dodecyl sulphate) 
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10% Blocking Reagent: In 80 ml deionized distilled water, dissolve 1.16 g maleic acid. Next, add NaOH to adjust 
the pH to 7.5. Add 10 g of the blocking reagent powder. Heat to 60^*0 while stirring to dissolve the powder. Adjust 
the volume to 100 ml with water. Stir and sterilize. 



Prehybridization Mix: 


Final Concentration 


Components 


Volume (per 100 ml) 


Stock 


50% 


Formamide 


50 ml 


100% 


5X 


ssc 


25 ml 


20X 


0.1% 


Sarkosyl 


0.5 ml 


20% 


0.02% 


SDS 


0.1 ml 


20% 


2% 


Blocking Reagent 


20 ml 


10% 




Water 


4.4 ml 





General Procedures: 
[2342] 

1 . Place the blot in a heat-sealable plastic bag and add an appropriate volume of prehybridization solution (30 ml/ 
lOOcm^) at room temperature. Seal the bag with a heat sealer, avoiding bubbles as much as possible. Lay down 
the bags in a large plastic tray (one tray can accommodate at least 4-5 bags). Ensure that the bags are lying flat 
in the tray so that the prehybridization solution is evenly distributed throughout the bag. Incubate the blot for at 
least 2 hours with gentle agitation using a waver shaker. 

2. Denature DIG-labeled DNA probe by incubating for 10 min. at 98°C using the PGR machine and immediately 
cool it to 4*C. 

3. Add probe to prehybridization solution (25 ng/ml; 30 ml = 750 ng total probe) and mix well but avoid foaming. 
Bubbles may lead to background. 

4. Pour off the prehybridization solution from the hybridization bags and add new prehybridization and probe so- 
lution mixture to the bags containing the membrane. 

5. Incubate with gentle agitation for at least 16 hours. 

6. Proceed to medium stringency post-hybridization wash: 

Three times for 20 min. each with gentle agitation using IX SSC, 1% SDS at 60°C. 

All wash solutions must be prewarmed to 60°C. Use about 100 ml of wash solution per membrane. 

To avoid background keep the membranes fully submerged to avoid drying in spots; agitate sufficiently to 
avoid having membranes stick to one another. 

7. After the wash, proceed to immunological detection and CSPD development. 
E. Procedure for Immunological Detection with CSPD 

Solutions: 
[2343] 

Buffer 1: Maleic acid buffer (0.1 M maleic acid, 0.15 M NaCI; adjusted to pH 7.5 with NaoH) 



339 



EP 1 033 405 A2 

Washing buffer: Maleic acid buffer with 0.3% (v/v) Tween 20. 

Blockina stock solution 10% blocking reagent in buffer 1. Dissolve (iOX concenlraiiu..). blocking resgcnt pcv/- 

Blockmg stock solution (e^ehringer Mannheim. Indianapolis. IN. cat. no. 1096176) by constantly stimng 

on a 65°C heating block or heat in a microwave, autoclave and store at 4°C. 

Buffer 2 

(IX blocking solution): Dilute the stock solution 1 :10 in Buffer 1 . 

Detection buffer: 0.1 M Tris. 0.1 M NaCI. pH 9.5 

Procedure: 
[2344] 

1 . After the post-hybridization wash the blots are briefly rinsed (1-5 min.) in the maleate washing buffer with gentle 
shaking. 

2. Then the membranes are incubated for 30 min. in Buffer 2 with gentle shaking. 

3. Anti-DlG-AP conjugate (Boehringer Mannheim. Indianapolis, IN. cat. no. 1093274) at 75 mU/mi (1:10.000) in 
Buffer 2 is used for detection. 75 ml of solution can be used for 3 blots. 

4. The membrane is incubated for 30 min. in the antibody solution with gentle shaking. 

5. The membrane are washed twice in washing buffer with gentle shaking. About 250 mis is used per wash for 3 
blots. 

6. The blots are equilibrated for 2-5 min in 60 ml detection buffer. 

7 Dilute CSPD (1 :200) In detection buffer. (This can be prepared ahead of time and stored in da^J 4°C) _ 
The following steps must be done Individually. Bags (one for detection and one for exposure) are generally cut 
and ready before doing the following steps. 

8 The blot is carefully removed from the detection buffer and excess liquid removed without drying the membrane_ 
Th^blot ?s mmSiately placed in a bag and 1 .5 ml of CSPD solution is added. The CSPD solution can be ^P^ej^ 
ovir the membrane. Bubbles present at the edge and on the surface of the blot are typically removed by gentle 
rubbing. The membrane is incubated for 5 min. in CSPD solution. 

9. Excess liquid is removed and the membrane is blotted briefly (DNA side up) on Whatman 3MM paper. Do not 
let the membrane dry completely. 

10. seal the damp membrane in a hybridization bag and incubate for 10 min at 37°C to enhance the luminescent 
reaction. 

11. Expose for2hoursatroom temperature toX-rayf.lm.Muttiple exposures can be taken. Luminescence 
for at least 24 hours and signal intensity increases during the first hours. 

Exam ple 3: Transformation of Carrot Cells 

[2345] Transfomiation of plant cells can be accomplished by a number of methods, as described ^^ove^ Si';;|'^^y- 
a number Splant genera can be regenerated from tissue culture following transformation. Transfom,at,on and regen- 

%^:^*Srsu%Sn"^^^^^^^^ caro.) cells are estabHshed -m ~ o^^^^^^^^^^ 

Early Nantes in B^ growth medium (O.L. Gamborg et al.. Plant Physio,. 45:372 (1970)) plus 2 4_D and 15 mM CaC^ 
(B. -44 medium) by methods known In the art. The suspension cultures are subcultured by adding 10 m^ of sus 
pen^n culture to 40 ml of 6^-44 medium in 250 ml flasks every 7 days and are maintained in a shaker at 150 rpm at 
27 ^'C in the dark. 
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[2347] The suspension culture cells are transformed with exogenous DNA as described by Z. Chen et al. Plant Mot. 
Bio. 36:163 (1998). Briefly, 4-days post-subculture cells are incubated with celt wall digestion solution containing 0.4 
M sorbitol. 2% driselase. 5mM MES (2-[N-Morpholino] ethanesulfonic acid) pH 5.0 for 5 hours. The digested cells are 
pelleted gently at 60 xg for 5 min. and washed twice in W5 solution containing 154 mM NaCI, 5 mM KCl, 125 mM CaC^ 
s and 5mM glucose. pH 6.0. The protoplasts are suspended in MC solution containing 5 mM MES. 20 mM CaCl2. 0.5 
M mannitol, pH 5.7 and the protoplast density Is adjusted to about 4 x 10^ protoplasts per ml. 

[2348] 15-60 ^ig of plasmid DNA is mixed with 0.9 ml of protoplasts. The resulting suspension is mixed with 40% 
polyethylene glycol (MW 8000, PEG 8000), by gentle inversion a few times at room temperature for 5 to 25 min. 
Protoplast culture medium known in the art is added into the PEG-DNA>protoplast mixture. Protoplasts are incubated 
10 in the culture medium for 24 hour to 5 days and cell extracts can be used for assay of transient expression of the 
introduced gene. Alternatively, transformed cells can be used to produce transgenic callus, which in turn can be used 
to produce transgenic plants, by methods known in the art. See, for example, Nomura and Komamine, Pit Phys. 79: 
988-991 (1985), Identification and Isolation of Single Cells that Produce Somatic Embryos in Carrot Suspension Cul- 
tures, 

15 [2349] The invention being thus described, it will be apparent to one of ordinary skill in the art that various modifica- 
tions of the materials and methods for practicing the invention can be made. Such modifications are to be considered 
within the scope of the invention as defined by the following claims. 

[2350] Each of the references from the patent and periodical literature cited herein is hereby expressly incorporated 
in its entirety by such citation. 

20 

Claims 

1 . An isolated nucleic acid molecule comprising a nucleic acid having a nucleotide sequence which encodes an amino 
25 acid sequence exhibiting at least 40% sequence identity to an amino acid sequence encoded by 

(a) a nucleotide sequence described in REF and/or SEQ Table 1 or 2 or a fragment thereof; or 

(b) a complement of a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof. 

30 2. An isolated nucleic acid molecule comprising a nucleic acid having a nucleotide sequence which exhibits at least 
65% sequence identity to 

(a) a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof; or 

(b) a complement of a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof. 

35 

3. An isolated nucleic acid molecule comprising a nucleic acid having a nucleotide sequence which exhibits at least 
65% sequence identity to a gene comprising 

(a) a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof; or 
^ (b) a complement of a nucleotide sequence shown In REF and/or SEQ Table 1 or 2 or a fragment thereof. 

4. An isolated nucleic acid molecule which is the reverse of the isolated nucleotide sequence according to any one 
of claims 1-3, such that the reverse nucleotide sequence has a sequence order which is the reverse of the sequence 
order of said isolated nucleotide sequence according to any one of claims 1-3. 

45 

5. An isolated nucleic acid molecule comprising a nucleic acid capable of hybridizing to a nucleic acid having a 
sequence selected from the group consisting of: 

(a) a nucleotide sequence which is shown in REF and/or SEQ Table 1 or 2; and 
50 (b) a nucleotide sequence which is complementary to a nucleotide sequence shown in REF and/or SEQ Table 

1 or 2; 

under conditions that permit formation of a nucleic acid duplex at a temperature from about 40^0 and 48°C below 
the melting temperature of the nucleic acid duplex. 

55 

6. The nucleic acid molecule according to any one of claims 1-5. wherein said nucleic acid comprises an open reading 
frame. 
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transcriptoin-factor binding sequence, and any combination thereof. 

specffic expression or root-specific expression of a sequence or any comb.nat.on thereof. 

10 A vector construct comprising a nucleic acid molecule according to any one of claims 1-9. wherein said nucleic 
acid molecule is heterologous to any element in said vector construct. 

11 A vector construct according to claim 10 comprising: 

claims 1-4; 

Wherein said first and second nucleic acids are operably linked and wherein said second nucleic acid is heterolo- 
gous to any element in said vector construct. 

12. The vector construct according to claim 11 . wherein said first nucleic acid is native to said second nucleic acid. 

13. The vector construct according to dalm 11 . wherein said first nucleic acid is heterologous to said second nucleic 
acid. 

14. A vector construct according to claim 10 comprising: 

(c) a first nucleic acid having having the sequence of said isolated nucleic acid molecule according to claim 
7; and 

(d) a second nucleic acid; 

Wherein said first and second nucleic acids are operab^ linked and wherein said first nucleic acid is heterologous 
to any element in said vector construct. 

15. The vector construct according to claim 14. wherein said first nucleic acid is native to said second nucleic acid. 

16. The vector construct according to claim 14. wherein said first nucleic acid is heterologous to said second nucleic 
acid. 

17. A host cell comprising an isolated nucleic acid molecule according to any one of claims 1-4. wherein said nucleic 
acid molecule is fianked by exogenous sequence. 

18. A host cell comprising a vector construct of any one of claims 10-16. 

19. An Isolated polypeptide comprising an amino acid sequence 

(a) exhibiting at least 40% sequence identity of an amino acid sequence encoded by a sequence shown in 

[Lfcarrj^^ra^le^ 

seqence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof. 

20 The isolated polypeptide of claim 19. wherein said amino acid sequence exhit>its at least 75% seqL^nce identity 
to arl amT.^ acid sequence encoded by a sequence shown in SEQ Table 1 or 2 or a fragment thereof. 

21. The isolated polypeptide of claim 19. wherein said amino acid sequence exhibits at least 85% sequence identity 
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to an amino acid sequence encoded by a sequence shown in SEQ Table 1 or 2 or a fragment thereof. 

22. The isolated polypeptide of claim 19, wherein said amino acid sequence exhibits at least 90% sequence identity 
to an amino acid sequence encoded by a sequence shown in SEQ Table 1 or 2 or a fragment thereof. 

5 

23. An antibody capable of binding the isolated polypeptide of any one of claims 19-22. 

24. A method of introducing an isolated nucleic acid into a host cell comprising: 

10 (a) providing an isolated nucleic acid molecule according to any one of claims 1-4; and 

(b) contacting said isolated nucleic with said host cell under conditions that permit insertion of said nucleic 
acid into said host cell. 

25. A method of transforming a host cell which comprises contacting a host cell with a vector construct according to 
15 any one of claims 1 0-1 6. 

26. A method of modulating transcription and/or translation of a nucleic acid in a host ceil comprising: 

(a) providing the host cell of claim 24 or 25; and 
20 (b) culturing said host cell under conditions that permit transcription or translation. 

27. A method for detecting a nucleic acid in a sample which comprises: 

(a) providing an isolated nucleic acid molecule according to any one of claims 1-5; 
25 (b) contacting said isolated nucleic acid molecule with a sample under conditions which permit a comparison 

of the sequence of said isolated nucleic acid molecule with the sequence of DNA in said sample; and 

(c) analyzing the result of said comparison. 

28. The method according to claim 27, wherein said isolated nucleic acid molecule and said sample are contacted 
30 under conditions which permit the formation of a duplex between complementary nucleic acid sequences. 

29. A plant or cell of a plant which comprises a nucleic acid molecule according to any one of claims 1-4 which is 
exogenous to said plant or plant cell. 

35 30. A plant or cell of a plant which comprises a nucleic acid molecule according to any one of claims 1-4. wherein said 
nucleic acid molecule is heterologous to said plant or said cell of a plant. 

31 . A plant or cell of a plant which has been transformed with a nucleic acid molecule according to any one of claims 1-4. 

40 32. A plant of cell of a plant which comprises a vector construct according to any one of claims 10-16. 

33. A plant of cell of a plant which has been transformed with a vector construct according to any one of claims 1 0-1 6. 

34. A plant which has been regenerated from a plant cell according to any one of claims 29-33. 

45 
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55 
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first mentioned in the claims, namely daims: 
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Application Number 



The Search Division considers that the present European patent application does not comply with the 
requirements of unity of invention and relates to several inventions or groups of inventions, namely: 

1. Claims: Invention 1: Claims 1-34 all partially 

Nucleic acid molecules and corresponding encoded polypeptide 
sequences as defined by SEQ ID N0S:l-4, and sequences with 
identities thereto as defined by the . 
claims, vectors»methods. hosts, and plants comprising said 
sequences and antibodies to said polypeptides 



2. Claims: Inventions 2-80256: Claims 1-34 all partially as 



appropriate for either nucleic or polypeptide 
sequences 



Nucleic acid or polypeptide sequences as defined by SEQ ID 
N0S:5-80259 such that each consecutive sequence relates to 
an individual invention where invention 2 relates to SEQ ID 
N0:5,invention 3 to SEQ ID NO: 6 continuing through to 
invention 80256 and SEQ ID N0:8Q259, and the corresponding 
methods, vectors, hosts and antibodies relating to said 
sequences . 



It should be noted that for the first invention both nucleic 
acid and the corresponding encoded polypeptide sequences 
have been grouped as one(SEQ ID N0S:l-4).In view of the 
unreasonable burden required to regroup the reiiiaimng 
nucleic acid sequences with their corresponding encoded 
polypeptides, each of these sequences has been communicated 
as a seperate invention. However in the event that further- 
search fees are paid for subsequent inventions, the search 
division is willing to regroup sequences on the basis of any 
one nucleic acid sequence in combination with its encoded 
polypeptide(s) ,in so far as such sequences have not been 
previously disclosed. In effect any subsequent search for any 
individual nucleic acid sequence (with its encoded 
polypeptides) will be subject to an additional search fee. 
Should additional fees be paid then the applicant is kindly 
requested to indicate the corresponding encoded polypeptide 
sequences for each nucleic acid sequence to be searched. 
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